Bug 102751 - infinite loop in jbd2_journal_destroy()
Summary: infinite loop in jbd2_journal_destroy()
Status: NEW
Alias: None
Product: File System
Classification: Unclassified
Component: ext4 (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: fs_ext4@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-08-12 22:38 UTC by Mihai Donțu
Modified: 2016-03-20 10:01 UTC (History)
1 user (show)

See Also:
Kernel Version: 4.1.5
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Mihai Donțu 2015-08-12 22:38:22 UTC
While watching a video from a removable disk (USB), the connecting cable failed (too much use) and I had to unplug it. I noticed, however, that vlc has started consuming 100% CPU time while being zombie. An Alt+SysReq+l showed this:

NMI backtrace for cpu 2
CPU: 2 PID: 17378 Comm: vlc Tainted: G           O    4.1.5-gentoo #1
Hardware name: Dell Inc. Latitude E7440/07F3F4, BIOS A15 05/19/2015
task: ffff88029d050000 ti: ffff8802cd80c000 task.ti: ffff8802cd80c000
RIP: 0010:[<ffffffff8cec3320>]  [<ffffffff8cec3320>] mutex_unlock+0x10/0x20
RSP: 0018:ffff8802cd80fcd0  EFLAGS: 00000202
RAX: 00000000fffffffb RBX: ffff880084068000 RCX: 0000000000000000
RDX: 0000000080000001 RSI: 0000000000000000 RDI: ffff8800840680e8
RBP: ffff8802cd80fd38 R08: 000000000000000a R09: 00000000000004b0
R10: 0000000000017e98 R11: 00000000000004b0 R12: ffff880084068398
R13: ffff8800840680e8 R14: ffff8802cd80fcf0 R15: ffff8800840680a0
FS:  00007fa8ac663700(0000) GS:ffff88041eb00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f5b3e946000 CR3: 000000000d80d000 CR4: 00000000001426e0
Stack:
 ffffffff8c3d1318 ffff880200000000 ffff88029d050000 ffffffff8c179cc0
 ffff8802cd80fcf0 ffff8802cd80fcf0 0000000028b119c8 ffff88015d99c400
 ffff88008406c000 ffff880185940400 ffff880084068800 ffff88029d050000
Call Trace:
 [<ffffffff8c3d1318>] ? jbd2_journal_destroy+0x138/0x240
 [<ffffffff8c179cc0>] ? wake_atomic_t_function+0x60/0x60
 [<ffffffff8c38f0e7>] ext4_put_super+0x67/0x360
 [<ffffffff8c29d726>] generic_shutdown_super+0x76/0x100
 [<ffffffff8c29dae7>] kill_block_super+0x27/0x80
 [<ffffffff8c29de59>] deactivate_locked_super+0x49/0x80
 [<ffffffff8c29e2cc>] deactivate_super+0x6c/0x80
 [<ffffffff8c2bc033>] cleanup_mnt+0x43/0xa0
 [<ffffffff8c2bc0e2>] __cleanup_mnt+0x12/0x20
 [<ffffffff8c153804>] task_work_run+0xd4/0xf0
 [<ffffffff8c139174>] do_exit+0x2f4/0xb90
 [<ffffffff8c1d381c>] ? __audit_syscall_entry+0xac/0x100
 [<ffffffff8c05f745>] ? do_audit_syscall_entry+0x55/0x80
 [<ffffffff8c139a9b>] do_group_exit+0x3b/0xb0
 [<ffffffff8c139b24>] SyS_exit_group+0x14/0x20
 [<ffffffff8cec59db>] system_call_fastpath+0x16/0x6e
Code: ff 4c 89 e7 e8 d2 1e 00 00 5b 41 5c 5d c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 c7 47 18 00 00 00 00 f0 ff 07 <7f> 0a 55 48 89 e5 e8 95 ff ff ff 5d c3 0f 1f 00 0f 1f 44 00 00

and perf top (first 9 lines):

  18.08%  [kernel]  [k] _raw_spin_lock
  17.97%  [kernel]  [k] mutex_lock
  15.36%  [kernel]  [k] mutex_unlock
  10.89%  [kernel]  [k] _raw_spin_unlock
   6.49%  [kernel]  [k] jbd2_log_do_checkpoint
   6.16%  [kernel]  [k] preempt_count_add
   4.53%  [kernel]  [k] jbd2_cleanup_journal_tail
   3.96%  [kernel]  [k] preempt_count_sub
   3.21%  [kernel]  [k] jbd2_journal_destroy

Looking at the code it would seem that I've hit a race in:

  while (journal->j_checkpoint_transactions != NULL) { ... }

because it's waiting for a transaction that cannot take place:

Buffer I/O error on dev dm-1, logical block 243826688, lost sync page write
JBD2: Error -5 detected when updating journal superblock for dm-1-8.
Aborting journal on device dm-1-8.
Buffer I/O error on dev dm-1, logical block 243826688, lost sync page write
JBD2: Error -5 detected when updating journal superblock for dm-1-8.

Maybe the loop should be abandoned on jbd2_log_do_checkpoint() error?

The USB failure happened several times before, but I've never seen vlc get stuck. This also means that I'm unlikely to be able to reproduce this. :-(

One more detail: the ext4 filesystem sits on top a LUKS device.
Comment 1 Eryu Guan 2015-08-13 02:13:31 UTC
Seems like the issue described in this thread (and there's a patch fixing this problem)

http://www.spinics.net/lists/linux-ext4/msg48563.html

Note You need to log in before you can comment on or make changes to this bug.