I was trying to delete a large file of 44GB over nfs and then the server started acting weird and unresponsive. It won't even reboot. I managed to get this out of kernel logs. Additionally, I very frequently get an "Input/output Error" whenever I am trying to remove or upload or even download files over nfs, even though *sometimes* the action, like deleting, completed successfully. I don't know if this is related. I didn't see a crash like this before in the past, even when I got the input/output error. Oct 12 12:56:02 smeagol kernel: [70821.740016] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028 Oct 12 12:56:02 smeagol kernel: [70821.740034] IP: [<ffffffff8111818c>] ext4_ext_remove_space+0x9ec/0xcb0 Oct 12 12:56:02 smeagol kernel: [70821.740053] PGD 3436f067 PUD 340e8067 PMD 0 Oct 12 12:56:02 smeagol kernel: [70821.740064] Oops: 0000 [#1] Oct 12 12:56:02 smeagol kernel: [70821.740070] CPU 0 Oct 12 12:56:02 smeagol kernel: [70821.740074] Modules linked in: iptable_nat nf_nat iptable_mangle xt_LOG xt_mac nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_multiport iptable_filter ip_tables snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_pcm snd_timer snd soundcore snd_page_alloc Oct 12 12:56:02 smeagol kernel: [70821.740115] Oct 12 12:56:02 smeagol kernel: [70821.740122] Pid: 4065, comm: nfsd Tainted: G W 3.4.9-gentoo #1 Gigabyte Technology Co., Ltd. GA-880GA-UD3H/GA-880GA-UD3H Oct 12 12:56:02 smeagol kernel: [70821.740138] RIP: 0010:[<ffffffff8111818c>] [<ffffffff8111818c>] ext4_ext_remove_space+0x9ec/0xcb0 Oct 12 12:56:02 smeagol kernel: [70821.740154] RSP: 0018:ffff88003843dbb0 EFLAGS: 00010246 Oct 12 12:56:02 smeagol kernel: [70821.740161] RAX: 0000000000000000 RBX: ffff880001fa24b0 RCX: 0000000000000001 Oct 12 12:56:02 smeagol kernel: [70821.740170] RDX: 0000000000000001 RSI: 000000000a59a202 RDI: 000000000a59a202 Oct 12 12:56:02 smeagol kernel: [70821.740178] RBP: ffff88003961d6b8 R08: 000000000c2ad000 R09: ffff880001fa2480 Oct 12 12:56:02 smeagol kernel: [70821.740186] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880001fa2480 Oct 12 12:56:02 smeagol kernel: [70821.740194] R13: 0000000000000001 R14: ffff880017e90de0 R15: ffff88003df56060 Oct 12 12:56:02 smeagol kernel: [70821.740206] FS: 00007fd37a824740(0000) GS:ffffffff81471000(0000) knlGS:0000000000000000 Oct 12 12:56:02 smeagol kernel: [70821.740218] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Oct 12 12:56:02 smeagol kernel: [70821.740226] CR2: 0000000000000028 CR3: 000000003849c000 CR4: 00000000000007f0 Oct 12 12:56:02 smeagol kernel: [70821.740235] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Oct 12 12:56:02 smeagol kernel: [70821.740244] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Oct 12 12:56:02 smeagol kernel: [70821.740254] Process nfsd (pid: 4065, threadinfo ffff88003843c000, task ffff880038646eb0) Oct 12 12:56:02 smeagol kernel: [70821.740264] Stack: Oct 12 12:56:02 smeagol kernel: [70821.740269] 00000000fffffffb ffff88003961d6b8 ffff88003961d618 ffff88003843dc50 Oct 12 12:56:02 smeagol kernel: [70821.740282] ffff8800388b4478 ffff8800fffffff5 ffff88003c296800 ffffffff0000000a Oct 12 12:56:02 smeagol kernel: [70821.740295] 00000000fffffffe ffff880031a3b00c ffff880031a3b000 000000003961d6b8 Oct 12 12:56:02 smeagol kernel: [70821.740307] Call Trace: Oct 12 12:56:02 smeagol kernel: [70821.740319] [<ffffffff8111a015>] ? ext4_ext_truncate+0x1c5/0x220 Oct 12 12:56:02 smeagol kernel: [70821.740332] [<ffffffff81104798>] ? ext4_evict_inode+0x358/0x3e0 Oct 12 12:56:02 smeagol kernel: [70821.740343] [<ffffffff810b54b8>] ? evict+0x88/0x1a0 Oct 12 12:56:02 smeagol kernel: [70821.740354] [<ffffffff810b24f8>] ? d_delete+0x138/0x140 Oct 12 12:56:02 smeagol kernel: [70821.740365] [<ffffffff810ac17a>] ? vfs_unlink+0xda/0xe0 Oct 12 12:56:02 smeagol kernel: [70821.740375] [<ffffffff8113c454>] ? nfsd_unlink+0x1d4/0x270 Oct 12 12:56:02 smeagol kernel: [70821.740387] [<ffffffff81147873>] ? nfsd4_remove+0x53/0x100 Oct 12 12:56:02 smeagol kernel: [70821.740398] [<ffffffff8114f118>] ? nfsd4_encode_operation+0x68/0xa0 Oct 12 12:56:02 smeagol kernel: [70821.740410] [<ffffffff8114622a>] ? nfsd4_proc_compound+0x1da/0x520 Oct 12 12:56:02 smeagol kernel: [70821.740422] [<ffffffff81136f3d>] ? nfsd_dispatch+0xbd/0x1b0 Oct 12 12:56:02 smeagol kernel: [70821.740435] [<ffffffff8132850c>] ? svc_process+0x48c/0x780 Oct 12 12:56:02 smeagol kernel: [70821.740446] [<ffffffff81048460>] ? try_to_wake_up+0x80/0x80 Oct 12 12:56:02 smeagol kernel: [70821.740458] [<ffffffff811365e0>] ? nfsd_get_default_max_blksize+0x50/0x50 Oct 12 12:56:02 smeagol kernel: [70821.740469] [<ffffffff81136685>] ? nfsd+0xa5/0x160 Oct 12 12:56:02 smeagol kernel: [70821.740479] [<ffffffff81041526>] ? kthread+0x96/0xa0 Oct 12 12:56:02 smeagol kernel: [70821.740492] [<ffffffff81347f44>] ? kernel_thread_helper+0x4/0x10 Oct 12 12:56:02 smeagol kernel: [70821.740503] [<ffffffff81041490>] ? kthread_worker_fn+0xf0/0xf0 Oct 12 12:56:02 smeagol kernel: [70821.740514] [<ffffffff81347f40>] ? gs_change+0xb/0xb Oct 12 12:56:02 smeagol kernel: [70821.740521] Code: 8b 4b 20 0f b7 41 02 48 8d 04 40 48 8d 04 81 48 89 43 18 0f b7 49 02 48 83 c1 01 48 89 0b e9 78 f8 ff ff 0f 1f 40 00 48 8b 43 28 <48> 8b 40 28 48 89 43 20 e9 4e f8 ff ff 0f 1f 80 00 00 00 00 48 Oct 12 12:56:02 smeagol kernel: [70821.740589] RIP [<ffffffff8111818c>] ext4_ext_remove_space+0x9ec/0xcb0 Oct 12 12:56:02 smeagol kernel: [70821.740601] RSP <ffff88003843dbb0> Oct 12 12:56:02 smeagol kernel: [70821.740607] CR2: 0000000000000028 Oct 12 12:56:02 smeagol kernel: [70821.740624] ---[ end trace 7361e783a3acef85 ]---
This is in all likelihood an ext4 bug, but I'm assigning this to Bruce so that he can have a look first. It has no chance of being an NFS filesystem (i.e. client) bug...
(In reply to comment #0) > Additionally, I very frequently get an "Input/output Error" whenever I am > trying to remove or upload or even download files over nfs, even though > *sometimes* the action, like deleting, completed successfully. I don't know > if > this is related. I didn't see a crash like this before in the past, even when > I > got the input/output error. Ignoring this one paragraph under the assumption it's due to soft mounts as in bug 48771. The ext4 bug appears to be a dup of 47611, fixed in the 3.4 stable branch by d8b868fd75 "ext4: fix kernel BUG on large-scale rm -rf commands", in 3.4.10.
Thank you both for your comments. Will upgrade :-) *** This bug has been marked as a duplicate of bug 47611 ***