Bug 198187

Summary:	jbd2_log_wait_commit hangs
Product:	File System	Reporter:	Alexander V. Lukyanov (lav)
Component:	ext4	Assignee:	fs_ext4 (fs_ext4)
Status:	NEW ---
Severity:	high	CC:	dralex2004, jack, pankajcabs, snitzer, tytso
Priority:	P1
Hardware:	Intel
OS:	Linux
Kernel Version:	>=4.13	Subsystem:
Regression:	No	Bisected commit-id:
Attachments:	hw config 1 hw config 2 LA graph dmesg dump output of the 'dmsetup table'

Description Alexander V. Lukyanov 2017-12-18 12:56:48 UTC

sometimes, a process stucks in jbd2_log_wait_commit wait state. After that, any "sync" and probably fsync calls stuck too. The symptom is LA growing to insane values. On one server the most common stuck process is logrotate in a lxc container. I could not log in to two other affected servers. Another few servers at my disposal do not exhibit this behaviour. There is no disk related messages in the log or dmesg. Nothing suspicious at all in the log. Graceful reboot hangs too.

The problem started with 4.13.8 and did not occur in 4.12.8.

Here is the dnf history info after which the problem started to occur:
    Erase    kernel-4.12.8-200.fc25.i686              @updates/25
    Install  kernel-4.13.8-100.fc25.i686              @updates/25

kernel downgrade fixes the problem.

Comment 1 Theodore Tso 2017-12-18 23:39:10 UTC

How often/frequently/reliably does the problem reproduce?

Can you get console messages either from dmesg or copied from a serial console (if you have serial console monitoring)?   Also useful would be to get the output from the following magic sysrq sequences: sysrq-l and sysrq-t  (this involves using the break key if you are using a serial console, which is the most desirable way to get the output since there is a huge amount of ouput).

Also, please let us know what the hardware configuration is, since it's possible this is related to change in a block device driver.

Finally, please file a bug with Fedora since this may be something which is caused by a Fedora specific patch and the Fedora kernel maintainers are more likely to be able to notice clustering of symptoms from Fedora users.

Comment 2 Alexander V. Lukyanov 2017-12-20 05:35:56 UTC

It happens every few days, maybe during logrotate at night. I have not found a reliable sequence leading to the problem.

On the console or in dmesg output there were no unusual messages.

Hardware info attached as two files.

Comment 3 Alexander V. Lukyanov 2017-12-20 05:36:35 UTC

Created attachment 261269 [details]
hw config 1

Comment 4 Alexander V. Lukyanov 2017-12-20 05:36:53 UTC

Created attachment 261271 [details]
hw config 2

Comment 5 Alexander V. Lukyanov 2017-12-20 05:40:26 UTC

Created attachment 261273 [details]
LA graph

here is the LA graph from our monitoring. The server was rebooted many times in November.

Comment 6 Jan Kara 2018-01-04 16:10:12 UTC

As Ted said we'd need sysrq output to debug this further. Actually output from sysrq-w should be enough. So if this happens again, can you please trigger it (either by running 'echo w >/proc/sysrq-trigger' or by pressing appropriate key combination on serial console) and post output from dmesg after it? Thanks.

Comment 7 Alexander A. 2018-01-18 06:24:51 UTC

Created attachment 273661 [details]
dmesg dump

This output was taken from dmesg after sending a command echo w >/proc/sysrq-trigger

Comment 8 Jan Kara 2018-01-18 10:05:53 UTC

Thanks for the output. So every process there waits for jbd2 thread to commit the running transaction. JBD2 does:

[144526.447408] jbd2/dm-19-8    D    0  1580      2 0x00000080
[144526.448176] Call Trace:
[144526.448937]  __schedule+0x3be/0x830
[144526.449632]  ? scsi_request_fn+0x3f/0x6b0
[144526.450310]  ? bit_wait+0x60/0x60
[144526.450993]  schedule+0x36/0x80
[144526.451673]  io_schedule+0x16/0x40
[144526.452343]  bit_wait_io+0x11/0x60
[144526.453017]  __wait_on_bit+0x58/0x90
[144526.453692]  out_of_line_wait_on_bit+0x8e/0xb0
[144526.454371]  ? bit_waitqueue+0x40/0x40
[144526.455051]  __wait_on_buffer+0x32/0x40
[144526.455731]  jbd2_journal_commit_transaction+0xfa4/0x1800
[144526.456414]  kjournald2+0xd2/0x270
[144526.457098]  ? kjournald2+0xd2/0x270
[144526.457782]  ? remove_wait_queue+0x70/0x70
[144526.458470]  kthread+0x109/0x140
[144526.459139]  ? commit_timeout+0x10/0x10
[144526.459821]  ? kthread_park+0x60/0x60
[144526.460497]  ? do_syscall_64+0x67/0x150
[144526.461166]  ret_from_fork+0x25/0x30

So we have submitted buffers for IO and they have not completed. In cases like this the problem is in 99% of cases either in the storage driver or in the storage firmware. Since you mentioned this started happening after kernel update and you seem to be using only plain SATA drives (am I right?), storage firmware is probably out of question.

You seem to be using some kind of RAID on top of these SATA drives, that would look like the most probable culprit at this point. Can you describe your storage configuration? Also output of 'dmsetup table' would be useful. After having that we would probably need to pull in DM developers to have a look. Thanks.

Comment 9 lavv17 2018-01-18 10:33:47 UTC

You are correct that we use plain SATA drives. LVM on top of them, with
raid1 mode.

18 янв. 2018 г. 13:06 пользователь <bugzilla-daemon@bugzilla.kernel.org>
написал:

> https://bugzilla.kernel.org/show_bug.cgi?id=198187
>
> --- Comment #8 from Jan Kara (jack@suse.cz) ---
> Thanks for the output. So every process there waits for jbd2 thread to
> commit
> the running transaction. JBD2 does:
>
> [144526.447408] jbd2/dm-19-8    D    0  1580      2 0x00000080
> [144526.448176] Call Trace:
> [144526.448937]  __schedule+0x3be/0x830
> [144526.449632]  ? scsi_request_fn+0x3f/0x6b0
> [144526.450310]  ? bit_wait+0x60/0x60
> [144526.450993]  schedule+0x36/0x80
> [144526.451673]  io_schedule+0x16/0x40
> [144526.452343]  bit_wait_io+0x11/0x60
> [144526.453017]  __wait_on_bit+0x58/0x90
> [144526.453692]  out_of_line_wait_on_bit+0x8e/0xb0
> [144526.454371]  ? bit_waitqueue+0x40/0x40
> [144526.455051]  __wait_on_buffer+0x32/0x40
> [144526.455731]  jbd2_journal_commit_transaction+0xfa4/0x1800
> [144526.456414]  kjournald2+0xd2/0x270
> [144526.457098]  ? kjournald2+0xd2/0x270
> [144526.457782]  ? remove_wait_queue+0x70/0x70
> [144526.458470]  kthread+0x109/0x140
> [144526.459139]  ? commit_timeout+0x10/0x10
> [144526.459821]  ? kthread_park+0x60/0x60
> [144526.460497]  ? do_syscall_64+0x67/0x150
> [144526.461166]  ret_from_fork+0x25/0x30
>
> So we have submitted buffers for IO and they have not completed. In cases
> like
> this the problem is in 99% of cases either in the storage driver or in the
> storage firmware. Since you mentioned this started happening after kernel
> update and you seem to be using only plain SATA drives (am I right?),
> storage
> firmware is probably out of question.
>
> You seem to be using some kind of RAID on top of these SATA drives, that
> would
> look like the most probable culprit at this point. Can you describe your
> storage configuration? Also output of 'dmsetup table' would be useful.
> After
> having that we would probably need to pull in DM developers to have a look.
> Thanks.
>
> --
> You are receiving this mail because:
> You reported the bug.

Comment 10 Alexander A. 2018-01-23 07:20:06 UTC

Created attachment 273805 [details]
output of the 'dmsetup table'

Posting output of the 'dmsetup table'

Comment 11 Jan Kara 2018-01-24 10:02:36 UTC

Thanks. Mike, any idea here? It appears JBD2 submitted IO to dm-raid1 and it never completed. Apparently this did not happen with 4.12.8 but it does happen relatively regularly with 4.13.8...

Comment 12 Mike Snitzer 2018-01-24 10:47:01 UTC

(In reply to Jan Kara from comment #11)
> Thanks. Mike, any idea here? It appears JBD2 submitted IO to dm-raid1 and it
> never completed. Apparently this did not happen with 4.12.8 but it does
> happen relatively regularly with 4.13.8...

git log --oneline v4.12.8^..v4.13.8 -- drivers/md/dm-raid1.c | tac
9966afa dm: fix REQ_RAHEAD handling
846785e dm: don't return errnos from ->map
1be5690 dm: change ->end_io calling convention
4e4cbee block: switch bios to blk_status_t

So just changes from Christoph to prepare for and finally switch to blk_status_t

$ git log --oneline v4.12.8^..v4.13.8 -- drivers/md/dm.c | tac
846785e dm: don't return errnos from ->map
1be5690 dm: change ->end_io calling convention
4e4cbee block: switch bios to blk_status_t
7e026c8 dm: add ->copy_from_iter() dax operation support
abebfbe dm: add ->flush() dax operation support
011067b blk: replace bioset_create_nobvec() with a flags arg to bioset_create()
47e0fb46 blk: make the bioset rescue_workqueue optional.
93e6442 dm: add basic support for using the select or poll function
a4aa5e5 dm: fix REQ_OP_ZONE_RESET bio handling
264c869 dm: fix REQ_OP_ZONE_REPORT bio handling
1099930 dm: introduce dm_remap_zone_report()
41341af dm: don't set bounce limit
fbd08e7 bio-integrity: fix interface for bio_integrity_trim
54385bf dm: fix the second dec_pending() argument in __split_and_process_bio()
6044078 dm: fix printk() rate limiting code
bfc0ab4 dax: remove the pmem_dax_ops->flush abstraction
666cb84 dm ioctl: fix alignment of event number in the device list

Nothing in these DM core changes stand out to me as a potential source for IO hangs.
But obviously the devil is in the details...

Could easily be something in block core is negatively influencing DM, etc.

Comment 13 pkb07815 2019-01-02 15:00:28 UTC

I am facing similar problem with 4.9.86 kernel, do we have the root cause for IO hangs?

Comment 14 Jan Kara 2019-01-02 17:22:28 UTC

Well, the reason for jbd2 thread staying for long in jbd2_journal_commit_transaction() can be simply a busy disk. So without more details it's difficult to tell whether you are really facing the same issue. So which processes are hung, does the hang eventually resolve itself or is reboot needed? Please provide sysrq-w output from dmesg when the system is hung.