Bug 208907 - [fstests generic/074 on xfs]: 5.7.10 fails with a hung task on
Summary: [fstests generic/074 on xfs]: 5.7.10 fails with a hung task on
Status: NEW
Alias: None
Product: File System
Classification: Unclassified
Component: XFS (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: FileSystem/XFS Default Virtual Assignee
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-08-14 18:43 UTC by Luis Chamberlain
Modified: 2020-08-14 22:41 UTC (History)
1 user (show)

See Also:
Kernel Version: 5.7.10
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Luis Chamberlain 2020-08-14 18:43:18 UTC
Should be extremely easy to reproduce in 5 commands with kdevops [0], leave everything with the defaults, and then just enable fstests.

[0] https://github.com/mcgrof/kdevops

make bringup
make fstests

Just ssh to kdevops-xfs and run:

cd /var/lib/xfstests/
./gendisks.sh -m
./check generic/074

Aug 14 18:27:34 kdevops-xfs-dev kernel: XFS (loop16): Mounting V5 Filesystem
Aug 14 18:27:34 kdevops-xfs-dev kernel: XFS (loop16): Ending clean mount
Aug 14 18:27:34 kdevops-xfs-dev kernel: xfs filesystem being mounted at /media/test supports timestamps until 2038 (0x7fffffff)
Aug 14 18:28:16 kdevops-xfs-dev kernel: nvme nvme1: I/O 128 QID 2 timeout, aborting
Aug 14 18:28:16 kdevops-xfs-dev kernel: nvme nvme1: Abort status: 0x4001
Aug 14 18:28:47 kdevops-xfs-dev kernel: nvme nvme1: I/O 128 QID 2 timeout, reset controller
Aug 14 18:28:54 kdevops-xfs-dev kernel: sched: RT throttling activated
Aug 14 18:31:12 kdevops-xfs-dev kernel: INFO: task xfsaild/nvme1n1:289 blocked for more than 120 seconds.
Aug 14 18:31:12 kdevops-xfs-dev kernel:       Not tainted 5.7.0-2-amd64 #1 Debian 5.7.10-1
Aug 14 18:31:12 kdevops-xfs-dev kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 14 18:31:12 kdevops-xfs-dev kernel: xfsaild/nvme1n1 D    0   289      2 0x80004000
Aug 14 18:31:12 kdevops-xfs-dev kernel: Call Trace:
Aug 14 18:31:12 kdevops-xfs-dev kernel:  __schedule+0x2da/0x770
Aug 14 18:31:12 kdevops-xfs-dev kernel:  ? xlog_cil_force_lsn+0xc0/0x220 [xfs]
Aug 14 18:31:12 kdevops-xfs-dev kernel:  schedule+0x4a/0xb0
Aug 14 18:31:12 kdevops-xfs-dev kernel:  xlog_wait_on_iclog+0x113/0x130 [xfs]
Aug 14 18:31:12 kdevops-xfs-dev kernel:  ? wake_up_q+0xa0/0xa0
Aug 14 18:31:12 kdevops-xfs-dev kernel:  xfsaild+0x1bd/0x810 [xfs]
Aug 14 18:31:12 kdevops-xfs-dev kernel:  ? __switch_to+0x80/0x3c0
Aug 14 18:31:12 kdevops-xfs-dev kernel:  kthread+0xf9/0x130
Aug 14 18:31:12 kdevops-xfs-dev kernel:  ? xfs_trans_ail_cursor_first+0x80/0x80 [xfs]
Aug 14 18:31:12 kdevops-xfs-dev kernel:  ? kthread_park+0x90/0x90
Aug 14 18:31:12 kdevops-xfs-dev kernel:  ret_from_fork+0x35/0x40
Aug 14 18:31:12 kdevops-xfs-dev kernel: INFO: task loop16:912 blocked for more than 120 seconds.
Aug 14 18:31:12 kdevops-xfs-dev kernel:       Not tainted 5.7.0-2-amd64 #1 Debian 5.7.10-1
Aug 14 18:31:12 kdevops-xfs-dev kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 14 18:31:12 kdevops-xfs-dev kernel: loop16          D    0   912      2 0x80004000
Aug 14 18:31:12 kdevops-xfs-dev kernel: Call Trace:
Aug 14 18:31:12 kdevops-xfs-dev kernel:  __schedule+0x2da/0x770
Aug 14 18:31:12 kdevops-xfs-dev kernel:  ? xlog_cil_force_lsn+0xc0/0x220 [xfs]
Aug 14 18:31:12 kdevops-xfs-dev kernel:  schedule+0x4a/0xb0
Aug 14 18:31:12 kdevops-xfs-dev kernel:  xlog_wait_on_iclog+0x113/0x130 [xfs]
Aug 14 18:31:12 kdevops-xfs-dev kernel:  ? wake_up_q+0xa0/0xa0
Aug 14 18:31:12 kdevops-xfs-dev kernel:  xfsaild+0x1bd/0x810 [xfs]
Aug 14 18:31:12 kdevops-xfs-dev kernel:  ? __switch_to+0x80/0x3c0
Aug 14 18:31:12 kdevops-xfs-dev kernel:  kthread+0xf9/0x130
Aug 14 18:31:12 kdevops-xfs-dev kernel:  ? xfs_trans_ail_cursor_first+0x80/0x80 [xfs]
Aug 14 18:31:12 kdevops-xfs-dev kernel:  ? kthread_park+0x90/0x90
Aug 14 18:31:12 kdevops-xfs-dev kernel:  ret_from_fork+0x35/0x40
Aug 14 18:31:12 kdevops-xfs-dev kernel: INFO: task loop16:912 blocked for more than 120 seconds.
Aug 14 18:31:12 kdevops-xfs-dev kernel:       Not tainted 5.7.0-2-amd64 #1 Debian 5.7.10-1
Aug 14 18:31:12 kdevops-xfs-dev kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 14 18:31:12 kdevops-xfs-dev kernel: loop16          D    0   912      2 0x80004000
Aug 14 18:31:12 kdevops-xfs-dev kernel: Call Trace:
Aug 14 18:31:12 kdevops-xfs-dev kernel:  __schedule+0x2da/0x770
Aug 14 18:31:12 kdevops-xfs-dev kernel:  schedule+0x4a/0xb0
Aug 14 18:31:12 kdevops-xfs-dev kernel:  xlog_wait_on_iclog+0x113/0x130 [xfs]
Aug 14 18:31:12 kdevops-xfs-dev kernel:  ? wake_up_q+0xa0/0xa0
Aug 14 18:31:12 kdevops-xfs-dev kernel:  __xfs_log_force_lsn+0x10a/0x1d0 [xfs]
Aug 14 18:31:12 kdevops-xfs-dev kernel:  ? xfs_file_fsync+0x1f4/0x230 [xfs]
Aug 14 18:31:12 kdevops-xfs-dev kernel:  xfs_log_force_lsn+0x91/0x120 [xfs]
Aug 14 18:31:12 kdevops-xfs-dev kernel:  xfs_file_fsync+0x1f4/0x230 [xfs]
Aug 14 18:31:12 kdevops-xfs-dev kernel:  ? __switch_to_asm+0x34/0x70
Aug 14 18:31:12 kdevops-xfs-dev kernel:  loop_queue_work+0x47d/0xa50 [loop]
Aug 14 18:31:12 kdevops-xfs-dev kernel:  ? __switch_to+0x80/0x3c0
Aug 14 18:31:12 kdevops-xfs-dev kernel:  ? __schedule+0x2e2/0x770
Aug 14 18:31:12 kdevops-xfs-dev kernel:  kthread_worker_fn+0x73/0x1d0
Aug 14 18:31:12 kdevops-xfs-dev kernel:  kthread+0xf9/0x130
Aug 14 18:31:12 kdevops-xfs-dev kernel:  ? loop_info64_to_compat+0x220/0x220 [loop]
Aug 14 18:31:12 kdevops-xfs-dev kernel:  ? kthread_park+0x90/0x90
Aug 14 18:31:12 kdevops-xfs-dev kernel:  ret_from_fork+0x35/0x40
Aug 14 18:31:12 kdevops-xfs-dev kernel: INFO: task umount:2212 blocked for more than 120 seconds.
Aug 14 18:31:12 kdevops-xfs-dev kernel:       Not tainted 5.7.0-2-amd64 #1 Debian 5.7.10-1
Aug 14 18:31:12 kdevops-xfs-dev kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 14 18:31:12 kdevops-xfs-dev kernel: umount          D    0  2212   2208 0x00004000
Comment 1 Eric Sandeen 2020-08-14 19:19:24 UTC
This seems relevant, no?

> Aug 14 18:28:16 kdevops-xfs-dev kernel: nvme nvme1: I/O 128 QID 2 timeout,
> aborting
> Aug 14 18:28:16 kdevops-xfs-dev kernel: nvme nvme1: Abort status: 0x4001
> Aug 14 18:28:47 kdevops-xfs-dev kernel: nvme nvme1: I/O 128 QID 2 timeout,
> reset controller

then 2.5 minutes later,

> Aug 14 18:31:12 kdevops-xfs-dev kernel: INFO: task xfsaild/nvme1n1:289
> blocked for more than 120 seconds.
Comment 2 Dave Chinner 2020-08-14 22:41:14 UTC
On Fri, Aug 14, 2020 at 06:43:18PM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> Just ssh to kdevops-xfs and run:
> 
> cd /var/lib/xfstests/
> ./gendisks.sh -m
> ./check generic/074
> 
> Aug 14 18:27:34 kdevops-xfs-dev kernel: XFS (loop16): Mounting V5 Filesystem
> Aug 14 18:27:34 kdevops-xfs-dev kernel: XFS (loop16): Ending clean mount
> Aug 14 18:27:34 kdevops-xfs-dev kernel: xfs filesystem being mounted at
> /media/test supports timestamps until 2038 (0x7fffffff)
> Aug 14 18:28:16 kdevops-xfs-dev kernel: nvme nvme1: I/O 128 QID 2 timeout,
> aborting
> Aug 14 18:28:16 kdevops-xfs-dev kernel: nvme nvme1: Abort status: 0x4001
> Aug 14 18:28:47 kdevops-xfs-dev kernel: nvme nvme1: I/O 128 QID 2 timeout,
> reset controller

Hardware lost an IO. I'm guessing the error handling that reset the
controller failed to error out the bio the lost IO belonged to, so
XFS has hung waiting for it...

Cheers,

Dave.

Note You need to log in before you can comment on or make changes to this bug.