Bug 198023 - blk-mq: hang on usb storages
Summary: blk-mq: hang on usb storages
Status: NEW
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: Block Layer (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Jens Axboe
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-11-29 01:09 UTC by Alban Browaeys
Modified: 2018-02-04 14:20 UTC (History)
2 users (show)

See Also:
Kernel Version: 4.15-rc1
Subsystem:
Regression: No
Bisected commit-id:


Attachments
blk-mq usb issues (105.08 KB, text/plain)
2017-11-29 01:09 UTC, Alban Browaeys
Details
blk trace - plug usb 2.1 stick (20.48 KB, text/plain)
2017-12-01 22:58 UTC, Alban Browaeys
Details

Description Alban Browaeys 2017-11-29 01:09:37 UTC
Created attachment 260917 [details]
blk-mq usb issues

When I unplug an usb stick or drive, I then get hung tasks warnings two minutes afterwards. Then udevd gets blocked too.


At upstream master commit bbecb1cfcca55f98cfcb62fa36a32d79975d8816.

scsi_mod.use_blk_mq=1 on kernel command line and bfq as scheduler (udev rule 'ACTION=="add|change", SUBSYSTEM=="block", KERNEL=="sd?", ATTR{queue/scheduler}="bfq"' ).

Attached two sessions (with a reboot in between).

NB: I switched to this upstream commit as it solves all suspend/resume issues I had (quiesce state).
Comment 1 Lei Ming 2017-12-01 18:13:14 UTC
(In reply to Alban Browaeys from comment #0)
> Created attachment 260917 [details]
> blk-mq usb issues
> 
> When I unplug an usb stick or drive, I then get hung tasks warnings two
> minutes afterwards. Then udevd gets blocked too.

It isn't strange to see hung tasks during unplug usb stick, please
try the following patch:

https://marc.info/?l=linux-block&m=151214241518562&w=2


> 
> 
> At upstream master commit bbecb1cfcca55f98cfcb62fa36a32d79975d8816.
> 
> scsi_mod.use_blk_mq=1 on kernel command line and bfq as scheduler (udev rule
> 'ACTION=="add|change", SUBSYSTEM=="block", KERNEL=="sd?",
> ATTR{queue/scheduler}="bfq"' ).
> 
> Attached two sessions (with a reboot in between).
> 
> NB: I switched to this upstream commit as it solves all suspend/resume
> issues I had (quiesce state).
Comment 2 Lei Ming 2017-12-01 18:24:53 UTC
(In reply to Lei Ming from comment #1)
> (In reply to Alban Browaeys from comment #0)
> > Created attachment 260917 [details]
> > blk-mq usb issues
> > 
> > When I unplug an usb stick or drive, I then get hung tasks warnings two
> > minutes afterwards. Then udevd gets blocked too.
> 
> It isn't strange to see hung tasks during unplug usb stick, please
> try the following patch:
> 
> https://marc.info/?l=linux-block&m=151214241518562&w=2

Just add words to make the situation clear:

1) the above link is for fixing another IO hang during removing device,
just for safety, please apply it

2) actually your IO hang happens before unplugging the USB stick,
that issue should be related with BFQ. And from your log, I believe
it is same with the report in the following link:

  https://marc.info/?l=linux-block&m=151214241518562&w=2

And please apply the debug patch in the link and provide us the log
when this hang happens.

Thanks,
Comment 3 Alban Browaeys 2017-12-01 22:58:55 UTC
Created attachment 260989 [details]
blk trace - plug usb 2.1 stick

The last 3 lines repeats.
I focus on the plug keep udevd in uninterruptible state.
Comment 4 Lei Ming 2017-12-01 23:49:26 UTC
Comment on attachment 260989 [details]
blk trace - plug usb 2.1 stick

Starting from the following two lines:

  systemd-udevd-1199  [001] ....   462.349524: bfq_insert_requests: insert rq->1
  systemd-udevd-1199  [001] ...2   462.349562: blk_mq_do_dispatch_sched: not get rq, 1

The (rq->1) stays at BFQ's queue, and can't be dispatched out any more even
though it is inserted into the queue, so it is a BFQ issue.

It is same with the report in the following link:

    https://marc.info/?l=linux-block&m=151214241518562&w=2

We may need Paolo to take a look.
Comment 5 Sergey Kondakov 2018-02-04 14:20:59 UTC
This just started happening with hangs of X randomly in half an hour of uptime after update to 4.15. First I though that it had something to do with GPU drivers or another PTI mess up (there were some known hangs with AMD CPUs, I think) but then I've actually read the log:

Feb 04 17:39:56 arsenal.patriots kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000090
Feb 04 17:39:56 arsenal.patriots kernel: IP: bfqg_stats_update_io_add+0x68/0x100
Feb 04 17:39:56 arsenal.patriots kernel: PGD 0 P4D 0 
Feb 04 17:39:56 arsenal.patriots kernel: Oops: 0000 [#1] PREEMPT SMP
Feb 04 17:39:56 arsenal.patriots kernel: Modules linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device af_packet ts_bm xt_pkttype xt_string nf_nat_ftp nf_conntrack_ftp xt_tcpudp ip6t_rpfil
Feb 04 17:39:56 arsenal.patriots kernel:  kvm_amd btusb btrtl btbcm btintel tvaudio msp3400 bluetooth kvm ecdh_generic irqbypass mac_hid ath9k ath9k_common ath9k_hw snd_hda_codec_realtek ath snd_hda_codec_generic 
Feb 04 17:39:56 arsenal.patriots kernel:  l2tp_netlink l2tp_core ip6_udp_tunnel udp_tunnel pppox ppp_generic slhc sg nbd dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ecryptfs efivarfs
Feb 04 17:39:56 arsenal.patriots kernel: CPU: 3 PID: 5578 Comm: pool Tainted: G           O     4.15.0-782.gac01747-HSF #1
Feb 04 17:39:56 arsenal.patriots kernel: Hardware name: Gigabyte Technology Co., Ltd. GA-990XA-UD3/GA-990XA-UD3, BIOS F14e 09/09/2014
Feb 04 17:39:56 arsenal.patriots kernel: RIP: 0010:bfqg_stats_update_io_add+0x68/0x100
Feb 04 17:39:56 arsenal.patriots kernel: RSP: 0018:ffffb08ec8e73ad8 EFLAGS: 00010046
Feb 04 17:39:56 arsenal.patriots kernel: RAX: 0000000000000000 RBX: ffff94a970e3d568 RCX: 0000000000000000
Feb 04 17:39:56 arsenal.patriots kernel: RDX: 0000000000000002 RSI: ffffffff9018d5d2 RDI: 0000000000000001
Feb 04 17:39:56 arsenal.patriots kernel: RBP: 0000000000000000 R08: 0000000f00000204 R09: 0000000000000200
Feb 04 17:39:56 arsenal.patriots kernel: R10: 0000000000000380 R11: ffff94a7a5f55900 R12: ffff94a971594ca8
Feb 04 17:39:56 arsenal.patriots kernel: R13: ffff94a971594c00 R14: ffffb08ec8e73ba0 R15: ffff94a971594ca8
Feb 04 17:39:56 arsenal.patriots kernel: FS:  00007fcc2f086700(0000) GS:ffff94a97ecc0000(0000) knlGS:0000000000000000
Feb 04 17:39:56 arsenal.patriots kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 04 17:39:56 arsenal.patriots kernel: CR2: 0000000000000090 CR3: 00000001e4d07000 CR4: 00000000000406e0
Feb 04 17:39:56 arsenal.patriots kernel: Call Trace:
Feb 04 17:39:56 arsenal.patriots kernel:  bfq_insert_requests+0xef/0xfe0
Feb 04 17:39:56 arsenal.patriots kernel:  ? blk_rq_map_user_iov+0xfe/0x210
Feb 04 17:39:56 arsenal.patriots kernel:  blk_mq_sched_insert_request+0x123/0x170
Feb 04 17:39:56 arsenal.patriots kernel:  blk_execute_rq+0x4d/0xa0
Feb 04 17:39:56 arsenal.patriots kernel:  sg_io+0x1be/0x420
Feb 04 17:39:56 arsenal.patriots kernel:  scsi_cmd_ioctl+0x29f/0x470
Feb 04 17:39:56 arsenal.patriots kernel:  sd_ioctl+0xbf/0x1a0 [sd_mod]
Feb 04 17:39:56 arsenal.patriots kernel:  blkdev_ioctl+0x909/0x9d0
Feb 04 17:39:56 arsenal.patriots kernel:  block_ioctl+0x39/0x40
Feb 04 17:39:56 arsenal.patriots kernel:  do_vfs_ioctl+0xa1/0x660
Feb 04 17:39:56 arsenal.patriots kernel:  SyS_ioctl+0x74/0x80
Feb 04 17:39:56 arsenal.patriots kernel:  do_syscall_64+0x6e/0x1a0
Feb 04 17:39:56 arsenal.patriots kernel:  entry_SYSCALL64_slow_path+0x25/0x25
Feb 04 17:39:56 arsenal.patriots kernel: RIP: 0033:0x7fcc32197377
Feb 04 17:39:56 arsenal.patriots kernel: RSP: 002b:00007fcc2f085898 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Feb 04 17:39:56 arsenal.patriots kernel: RAX: ffffffffffffffda RBX: 0000000000000010 RCX: 00007fcc32197377
Feb 04 17:39:56 arsenal.patriots kernel: RDX: 00007fcc2f0858a0 RSI: 0000000000002285 RDI: 000000000000000c
Feb 04 17:39:56 arsenal.patriots kernel: RBP: 00007fcc28045c00 R08: 00007fcc28045c18 R09: 0000000000000200
Feb 04 17:39:56 arsenal.patriots kernel: R10: 00007fcc2f0858a0 R11: 0000000000000246 R12: 00007fcc28045e18
Feb 04 17:39:56 arsenal.patriots kernel: R13: 00007fcc2f085ad0 R14: 0000000000000008 R15: 00007fcc280a2360
Feb 04 17:39:56 arsenal.patriots kernel: Code: 08 06 00 74 69 48 8d bb f8 04 00 00 ba ff ff ff 3f be 01 00 00 00 e8 f8 63 03 00 f6 83 d0 06 00 00 04 75 53 48 8b 83 c0 01 00 00 <4c> 39 a0 90 00 00 00 74 35 49 8b 84
Feb 04 17:39:56 arsenal.patriots kernel: RIP: bfqg_stats_update_io_add+0x68/0x100 RSP: ffffb08ec8e73ad8
Feb 04 17:39:56 arsenal.patriots kernel: CR2: 0000000000000090
Feb 04 17:39:57 arsenal.patriots kernel: ---[ end trace 36d20f7f43a71696 ]---
Feb 04 17:39:57 arsenal.patriots kernel: note: pool[5578] exited with preempt_count 1

No USB sticks around. Just 1 SSD and 5 HDDs with kyber/bfq combination on SATA. I don't think I'll be using bfq any longer...

Note You need to log in before you can comment on or make changes to this bug.