Bug 40472 - blkdev_issue_discard() hangs forever if underlying storage device is removed
Summary: blkdev_issue_discard() hangs forever if underlying storage device is removed
Status: RESOLVED CODE_FIX
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: Block Layer (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Jens Axboe
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-08-03 13:17 UTC by Bart Van Assche
Modified: 2012-01-22 19:26 UTC (History)
1 user (show)

See Also:
Kernel Version: 3.1-rc0
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Bart Van Assche 2011-08-03 13:17:25 UTC
Apparently blkdev_issue_discard() never times out, not even if the target device has been removed ? From the kernel log (triggered by mkfs.ext4 on an SRP SCSI device node):

sd 15:0:0:0: [sdb] Attached SCSI disk
scsi host15: SRP abort called
scsi host15: SRP reset_device called
scsi host15: ib_srp: SRP reset_host called
scsi host15: ib_srp: connection closed
scsi host15: ib_srp: Got failed path rec status -110
scsi host15: ib_srp: Path record query failed
scsi host15: ib_srp: reconnect failed (-110), removing target port.
sd 15:0:0:0: Device offlined - not ready after error recovery
INFO: task mkfs.ext4:4304 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mkfs.ext4       D 0000000000000000     0  4304   3649 0x00000000
 ffff88006c313b98 0000000000000046 ffffffff813e3038 ffffffff81e6b580
 0000000000000082 000000010003cfdc ffff88006c313fd8 ffff880070fbcbc0
 00000000001d1f40 ffff88006c313fd8 ffff88006c312000 ffff88006c312000
Call Trace:
 [<ffffffff813e3038>] ? schedule+0x628/0x830
 [<ffffffff813e3835>] schedule_timeout+0x1d5/0x310
 [<ffffffff810805de>] ? put_lock_stats+0xe/0x40
 [<ffffffff81080e05>] ? lock_release_holdtime+0xb5/0x160
 [<ffffffff813e6ac0>] ? _raw_spin_unlock_irq+0x30/0x60
 [<ffffffff8103f7d9>] ? sub_preempt_count+0xa9/0xe0
 [<ffffffff813e28e0>] wait_for_common+0x110/0x160
 [<ffffffff810425f0>] ? try_to_wake_up+0x2c0/0x2c0
 [<ffffffff813e2a0d>] wait_for_completion+0x1d/0x20
 [<ffffffff811de93a>] blkdev_issue_discard+0x27a/0x2c0
 [<ffffffff813e2806>] ? wait_for_common+0x36/0x160
 [<ffffffff811df371>] blkdev_ioctl+0x701/0x760
 [<ffffffff8112b7bf>] ? kmem_cache_free+0x6f/0x160
 [<ffffffff811755b7>] block_ioctl+0x47/0x50
 [<ffffffff81151b78>] do_vfs_ioctl+0x98/0x570
 [<ffffffff813e76dc>] ? sysret_check+0x27/0x62
 [<ffffffff8115209f>] sys_ioctl+0x4f/0x80
 [<ffffffff813e76ab>] system_call_fastpath+0x16/0x1b
no locks held by mkfs.ext4/4304.

The above message kept repeating forever until system reboot.

Kernel version:
$ git show | head -n 1
commit ed8f37370d83e695c0a4fa5d5fc7a83ecb947526
$ git describe
v3.0-7216-ged8f373
Comment 1 Bart Van Assche 2011-08-04 08:40:02 UTC
Note: I'm considering this as a bug because the state described above makes it impossible to kill the mkfs process and also makes it impossible to remove the kernel module ib_srp.
Comment 2 Andrew Morton 2011-08-26 20:55:31 UTC
Jens seems to be hiding.  Please report it via email to lkml and cc the people who have been working on that code?
Comment 3 Bart Van Assche 2011-08-27 06:26:57 UTC
(In reply to comment #2)
> Jens seems to be hiding.  Please report it via email to lkml and cc the
> people
> who have been working on that code?

Done - see also http://lkml.org/lkml/2011/8/27/6.
Comment 4 Bart Van Assche 2012-01-22 19:26:35 UTC
Fixed by commit 3308511c93e6ad0d3c58984ecd6e5e57f96b12c8.

Note You need to log in before you can comment on or make changes to this bug.