Bug 15231

Summary: kernel BUG at fs/ext4/inode.c:1852!
Product: File System Reporter: Franco Broi (franco)
Component: ext4Assignee: fs_ext4 (fs_ext4)
Status: CLOSED UNREPRODUCIBLE    
Severity: normal CC: alan, dmonakhov, jack, matt, tytso
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.32.7 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: messages file
messages file
Patch to try reverting the cherry pick of 39bc680

Description Franco Broi 2010-02-05 09:38:14 UTC
We recently deployed 9 NFS servers running Linux kernel 2.6.32.7. Each server exports 4 13TB EXT4 filesystems. During the first 24 hous of testing using simple sequential reads and writes, 2 of the 9 systems hung with "BUG: soft lockup - CPU#3 stuck for 61s".

We downgraded the kernel version to 2.6.32.2 and have not had the same problem after a further 24 hours of testing.

Message file attached.
Comment 1 Franco Broi 2010-02-05 09:43:08 UTC
Created attachment 24916 [details]
messages file
Comment 2 Franco Broi 2010-02-05 09:50:07 UTC
Changed the summary to be more specific.
Comment 3 Franco Broi 2010-02-09 01:05:11 UTC
Just saw this bug on similar hardware running 2.6.32.2
There's an intersting bit of memory corruption from the traceback at the end of the messages file attached.

Feb  8 17:52:27 echo24 kernel: kernel BUG at fs/inode.c:1343!
Feb  8 17:52:27 echo24 kernel: invalid opcode: 0000 [#1] SMP
Feb  8 17:52:27 echo24 kernel: last sysfs file: /sys/devices/platform/i5k_amb.0/temp2_input
Feb  8 17:52:27 echo24 kernel: CPU 5
Feb  8 17:52:27 echo24 kernel: Modules linked in: usb_storage nfsd exportfs mx_driver(P) mx_mcp(P) nfs lockd nfs_acl auth_rpcgss sunrpc ipv6 ext4 jbd2 crc16 dm_multipath uinput e1000e mptsas iTCO_wdt iTCO_vendor_support mptscsih i2c_i801 i2c_core pcspkr mptbase shpchp i5k_amb ioatdma serio_raw hwmon scsi_transport_sas dca ata_generic [last unloaded: myri10ge]
Feb  8 17:52:27 echo24 kernel: Pid: 2677, comm: nfsd Tainted: P        W  2.6.32.2 #1 X7DWE
Feb  8 17:52:27 echo24 kernel: RIP: 0010:[<ffffffff810fdaf9>]  [<ffffffff810fdaf9>] iput+0x1b/0x66
Feb  8 17:52:27 echo24 kernel: RSP: 0018:ffff88021183fd30  EFLAGS: 00010246
Feb  8 17:52:27 echo24 kernel: RAX: 0000000000000000 RBX: ffff88021b16dbc8 RCX: 00000000002a0024
Feb  8 17:52:27 echo24 kernel: RDX: 0000000000000001 RSI: ffffea00010cdb10 RDI: ffff88021b16dbc8
Feb  8 17:52:27 echo24 kernel: RBP: ffff88021183fd40 R08: ffff880210cd5d40 R09: 0000000000000003
Feb  8 17:52:27 echo24 kernel: R10: 0000000000000001 R11: ffff88021183fc30 R12: ffff88021b16db80
Feb  8 17:52:27 echo24 kernel: R13: 0000000000001000 R14: ffff880211f20000 R15: ffff88022df50e00
Feb  8 17:52:27 echo24 kernel: FS:  0000000000000000(0000) GS:ffff880028340000(0000) knlGS:0000000000000000
Feb  8 17:52:27 echo24 kernel: CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Feb  8 17:52:27 echo24 kernel: CR2: 0000000000736f68 CR3: 0000000210dde000 CR4: 00000000000406e0
Feb  8 17:52:27 echo24 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Feb  8 17:52:27 echo24 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Feb  8 17:52:27 echo24 kernel: Process nfsd (pid: 2677, threadinfo ffff88021183e000, task ffff880211f19760)
Feb  8 17:52:27 echo24 kernel: Stack:
Feb  8 17:52:27 echo24 kernel: ffff88021183fd50 ffff88014dfc6400 ffff88021183fd60 ffffffff812bdb2e
Feb  8 17:52:27 echo24 kernel: <0> ffff88014dfc6400 ffff88014dfc6410 ffff88021183fd80 ffffffffa01a51c5
Feb  8 17:52:27 echo24 kernel: <0> ffff88021183fd80 ffff88014dfc6400 ffff88021183fdb0 ffffffffa01af5d9
Feb  8 17:52:27 echo24 kernel: Call Trace:
Feb  8 17:52:27 echo24 kernel: [<ffffffff812bdb2e>] sock_release+0x5c/0x6c
Feb  8 17:52:27 echo24 kernel: [<ffffffffa01a51c5>] svc_sock_free+0x45/0x51 [sunrpc]
Feb  8 17:52:27 echo24 kernel: [<ffffffffa01af5d9>] svc_xprt_free+0x3e/0x4e [sunrpc]
Feb  8 17:52:27 echo24 kernel: [<ffffffffa01af59b>] ? svc_xprt_free+0x0/0x4e [sunrpc]
Feb  8 17:52:27 echo24 kernel: [<ffffffff811ab411>] kref_put+0x43/0x4f
Feb  8 17:52:27 echo24 kernel: [<ffffffffa01aeaed>] svc_xprt_release+0xc6/0xce [sunrpc]
Feb  8 17:52:27 echo24 kernel: [<ffffffffa01af2f4>] svc_recv+0x72d/0x78b [sunrpc]
Feb  8 17:52:27 echo24 kernel: [<ffffffff81049930>] ? default_wake_function+0x0/0xf
Feb  8 17:52:27 echo24 kernel: [<ffffffffa030f7df>] ? nfsd+0x0/0x133 [nfsd]
Feb  8 17:52:27 echo24 kernel: [<ffffffffa030f87f>] nfsd+0xa0/0x133 [nfsd]
Feb  8 17:52:27 echo24 kernel: [<ffffffff81065541>] kthread+0x7d/0x85
Feb  8 17:52:27 echo24 kernel: [<ffffffff81011cda>] child_rip+0xa/0x20
Feb  8 17:52:27 echo24 kernel: [<ffffffff8101163d>] ? restore_args+0x0/0x30
Feb  8 17:52:27 echo24 kernel: [<ffffffff810654c4>] ? kthread+0x0/0x85
Feb  8 17:52:27 echo24 kernel: [<ffffffff81011cd0>] ? child_rip+0x0/0x20
Feb  8 17:52:27 echo24 kernel: Code: c0 74 d5 48 83 c4 20 5b 41 5c 41 5d 41 5e c9 c3 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 85 ff 74 50 48 83 bf 10 02 00 00 40 75 04 <0f> 0b eb fe 48 8d 7f 48 48 c7 c6 20 5b 88 81 e8 6f b5 0a 00 85
Feb  8 17:52:27 echo24 kernel: RIP  [<ffffffff810fdaf9>] iput+0x1b/0x66
Feb  8 17:52:27 echo24 kernel: RSP <ffff88021183fd30>
Feb  8 17:52:27 echo24 kernel: ---[ end trace 56313310fcbaf839 ]---
Comment 4 Franco Broi 2010-02-09 01:06:03 UTC
Created attachment 24960 [details]
messages file
Comment 5 Theodore Tso 2010-02-09 03:52:45 UTC
OK, so the dump in comment #3 and #4 are a completely different issue.  It's not even clear it's an ext4 issue.  (An oops in fs/inode.c is quite different from an oops in fs/ext4/inode.c).
Comment 6 Theodore Tso 2010-02-09 03:57:17 UTC
It looks the failure is in comment #1 is due to commit dbe5cc00, which is a cherry pick of commit 39bc680a (ext4: fix sleep inside spinlock issue with quota and dealloc).  And which was introduced in 2.6.32.3.   

I didn't think we were going to merge this into -stable, since it introduced this failure.  Fixing it requires a series of commits.  Argh....  I'll try to get to figuring out which of the upstream commits need to go to -stable to fix this bug.
Comment 7 Jan Kara 2010-02-09 18:08:59 UTC
Ted, what problem does commit 39bc680a cause? Because I thought all the problems were introduced by commit d21cd8f (also mentioned in the changelogs) and that was not committed to -stable (exactly because of this). But the problems described in commit 0637c6f4135f592f094207c7c21e7c0fc5557834 were always there so maybe commit 39bc680a just makes them visible... Shrug.

Maybe just backing out 39bc680a would be an option for -stable...
Comment 8 Theodore Tso 2010-02-09 18:27:22 UTC
Well, the BUG_ON getting tripped as reported in comment #1 in this bug is:

    BUG_ON(mdblocks < EXT4_I(inode)->i_reserved_meta_blocks);

... and the commit 39bc680a does mess with how i_reserved_meta_blocks is calculated.


@@ -1836,10 +1834,11 @@ repeat:
                }
                return -ENOSPC;
        }
+       spin_lock(&EXT4_I(inode)->i_block_reservation_lock);
        EXT4_I(inode)->i_reserved_data_blocks += nrblocks;
-       EXT4_I(inode)->i_reserved_meta_blocks = mdblocks;
-
+       EXT4_I(inode)->i_reserved_meta_blocks += md_needed;
        spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
+
        return 0;       /* success */

I'm going to need to go back and see whether I screwed up earlier and blamed the wrong commit earlier, or whether that commit was implicated in some other regression.  However, given the claimed regression between 2.6.32.2 and 2.6.32.7, and given that commit 39bc680a was applied between 2.6.32.2 and 2.6.32.3, at the very least we should try backing out 39bc680a and seeing whether that makes Franco's problem go away.
Comment 9 Theodore Tso 2010-02-09 18:30:44 UTC
Created attachment 24975 [details]
Patch to try reverting the cherry pick of 39bc680

Franco, could you try applying this revert to either 2.6.32.7 or 2.7.32.8, and see if the problem still recurs?  Many thanks!!
Comment 10 Franco Broi 2010-02-10 03:26:04 UTC
I've applied the patch to 2.6.32.7 - strangely the patch program segfaulted but applied the patch correctly.

[root@echo2 linux-2.6.32.7]# patch -p1 < patch
patching file fs/ext4/inode.c
Segmentation fault

Feb 10 02:11:57 echo2 kernel: patch[18919]: segfault at 0 ip 0000003a0c07ee70 sp 00007fff85523278 error 4 in libc-2.10.1.so[3a0c000000+164000]

Also, before booting the new kernel I did a df, /data6 had a negative used block count. I guess this could be related to the current problem if the ref counting has gone a bit skew-whiff although this is with the system running 2.6.32.2.

Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/vgdata--3-data5
                     13458176552   2072548 13456104004   1% /data5
/dev/mapper/vgdata--3-data6
                     13458176552  -3576068 13461752620   0% /data6
/dev/mapper/vgdata--4-data7
                     13458176552  40839788 13417336764   1% /data7
/dev/mapper/vgdata--4-data8
                     13458176552   1933764 13456242788   1% /data8

I unmounted and ran fsck, it said it was clean and when I remounted it appeared to fix itself.

Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/vgdata--3-data5
                     13458176552   2072548 13456104004   1% /data5
/dev/mapper/vgdata--3-data6
                     13458176552   1920236 13456256316   1% /data6
/dev/mapper/vgdata--4-data7
                     13458176552  40974956 13417201596   1% /data7
/dev/mapper/vgdata--4-data8
                     13458176552   1933764 13456242788   1% /data8

The patched kernel is running on 3 servers - watch this space..
Comment 11 Franco Broi 2010-02-11 03:52:34 UTC
All clear after 24 hours of testing but still seeing the other non-ext4 bug.

echo3: kernel BUG at fs/inode.c:1343!
echo4: kernel BUG at fs/inode.c:1343!
Comment 12 Theodore Tso 2010-02-11 04:47:31 UTC
So the "kernel BUG at fs/inode.c:1343!" failure was showing up on both 2.6.32.2 and 2.6.32.7, right?   Do you know when it first started showing up?

The kernel stack trace indicates that it's coming from releasing one the nfs kernel thread's network sockets, when the TCP connection is broken or something else goes wrong.   I don't think it's ext4 related at all; it looks like an NFS server issue.
Comment 13 Franco Broi 2010-02-11 05:12:23 UTC
I've only seen this bug starting with 2.6.32.2. I haven't tried anyting between this version and 2.6.32.7.
Comment 14 Franco Broi 2010-02-11 05:22:19 UTC
Sorry, yes, the above errors were with the patched 2.6.32.7 kernel.
Comment 15 Franco Broi 2010-02-12 03:29:45 UTC
It looks like the patch has worked, thanks.

Got a few more of the fs/inode.c:1343! errors though.
Comment 16 Theodore Tso 2010-02-15 17:14:25 UTC
Franco,

Could you file a separate bug for the fs/inode.c:1343 problem?   And I'd suggest attaching it to the NFS component since that's what I strongly suspect is going on.

The other thing that would be really big help for me is if you could try running 2.6.33-rc8.  It has a series of fixes which I believe should address the quota problem, while still leaving the quota fix in place, and it would be really good to know if it has been fixed before Linus releases 2.6.33.  2.6.33-rc8 is pretty close to 2.6.33-final, so it should be relatively safe to run it.  Still, if you're hesitant to exposing your production machines to a pre-release kernel, I'll completely understand.

Thanks,

- Ted
Comment 17 Franco Broi 2010-02-16 00:49:00 UTC
Sorry Ted but I can't run any further tests on the production hardware but we have ordered another PB of RAID disk and servers which should start arriving in the next few weeks. So if you can hang out until then, I'll be able to give it a good trashing.
Comment 18 Franco Broi 2010-02-16 06:45:27 UTC
Trashing, hmmm bit of a Freudian slip! Of course I meant thrashing...