Running a machine with 8 dualcore Opterons 875 (16cores total), which makes it far more sensitive to locking-, timing- and racing bugs than the average linux- machine. Running a multi-threaded app (16 threads) that reads small portions of data from a large file (stored on remote server, accessed over NFS-udp), and write back to another file - same NFS-share. System crashes really hard (beyond softwatchdog- and oops-recovery) anywhere between 1 and 15 minutes. On 2.6.15.x, problem appeared first, was fixed by kernelpatches to file.c and pagelist.c. Went to 2.6.17.6 for the much improved multi-dualcore support, same problem appeared - unfortunately the original patch is integrated already, so it must be something else this time. Kernel does not OOPS, but it locks up - on all CPUs, according to the logs. Kernel BUG at mm/truncate.c:76 invalid opcode: 0000 [1] SMP CPU 14 Modules linked in: nfs netconsole sch_sfq cls_u32 sch_tbf sch_prio iptable_filter ip_tables x_tables nfsd exportfs lockd 8250 seri al_core ipv6 parport_pc lp parport autofs4 sunrpc w83627hf_wdt binfmt_misc xfs dm_mod video button battery ac ohci1394 ieee1394 oh ci_hcd ehci_hcd i2c_nforce2 i2c_core tg3 floppy ide_cd cdrom Pid: 8086, comm: dipfilter.x Not tainted 2.6.17.6 #1 RIP: 0010:[<ffffffff8025eb96>] <ffffffff8025eb96>{invalidate_complete_page+86} RSP: 0018:ffff810393c09ca8 EFLAGS: 00010002 RAX: 0000000000000825 RBX: ffff8105ff8d86f0 RCX: ffff8103f271bb08 RDX: 0000000000000000 RSI: ffff810393c09c48 RDI: ffff8103f271bd88 RBP: ffff8103f271bd70 R08: 0000000000000001 R09: 000000000000002c R10: 000000000000002c R11: ffff8105f8a26240 R12: ffff810393c09e08 R13: 0000000000000000 R14: ffff8103f271bd70 R15: 00000000000d1a2c FS: 00002accace13dc0(0000) GS:ffff810e001bd9c0(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00002b050fbff000 CR3: 0000000ff7c21000 CR4: 00000000000006e0 Process dipfilter.x (pid: 8086, threadinfo ffff810393c08000, task ffff8103f8b70280) Stack: 00000000000d1a2b ffff8105ff8d86f0 0000000000000000 ffffffff8025f0c0 0000000000000000 00000000f271bc28 0000000000000000 ffffffffffffffff 000000000000000e 0000000000000000 Call Trace: <ffffffff8025f0c0>{invalidate_inode_pages2_range+320} <ffffffff882042d9>{:nfs:nfs_revalidate_mapping+105} <ffffffff88202e29>{:nfs:nfs_file_write+169} <ffffffff8027d710> {do_sync_write+208} <ffffffff802424e0>{autoremove_wake_function+0} <ffffffff80227a00> {default_wake_function+0} <ffffffff8027d80f>{vfs_write+191} <ffffffff8027d9a3>{sys_write+83} <ffffffff80209c06>{system_call+126} Code: 0f 0b 68 dc f2 46 80 c2 4c 00 48 89 df e8 18 74 ff ff f0 81 RIP <ffffffff8025eb96>{invalidate_complete_page+86} RSP <ffff810393c09ca8> NMI Watchdog detected LOCKUP on CPU 10 CPU 10 Modules linked in: nfs netconsole sch_sfq cls_u32 sch_tbf sch_prio iptable_filter ip_tables x_tables nfsd exportfs lockd 8250 seri al_core ipv6 parport_pc lp parport autofs4 sunrpc w83627hf_wdt binfmt_misc xfs dm_mod video button battery ac ohci1394 ieee1394 oh ci_hcd ehci_hcd i2c_nforce2 i2c_core tg3 floppy ide_cd cdrom Pid: 8106, comm: dipfilter.x Not tainted 2.6.17.6 #1 RIP: 0010:[<ffffffff80309579>] <ffffffff80309579>{__read_lock_failed+5} RSP: 0018:ffff8103f8647c08 EFLAGS: 00000097 RAX: ffff8103f271bd88 RBX: 00000000000d2431 RCX: ffff8103f8647d58 RDX: 0000000000000000 RSI: 00000000000d2431 RDI: ffff8103f271bd88 RBP: ffff8103f271bd70 R08: ffff8103f8646000 R09: 00000000ffffffff R10: 00000000d2864068 R11: 0000000000000001 R12: ffff8103f271bd70 R13: 0000000000001000 R14: 0000000000001000 R15: ffff8103f8e83be8 FS: 00002b15cc9e2dc0(0000) GS:ffff810a0016ad40(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007fff0b64dbd8 CR3: 0000000393c21000 CR4: 00000000000006e0 Process dipfilter.x (pid: 8106, threadinfo ffff8103f8646000, task ffff8103f8645820) Stack: ffffffff8044802a ffffffff802569b5 ffff8103f271bd70 ffff810bff67f678 00000000000d2431 ffffffff80256f7d ffffffff804479ec 0000000000000000 0000000000000000 00000000d2438000 Call Trace: <ffffffff8044802a>{.text.lock.spinlock+83} <ffffffff802569b5>{find_get_page+21} <ffffffff80256f7d> {do_generic_mapping_read+397} <ffffffff804479ec>{__up_wakeup+53} <ffffffff80257360>{file_read_actor+0} <ffffffff802591e9>{__generic_file_aio_read+425} <ffffffff802593f4> {generic_file_aio_read+52} <ffffffff88202aba>{:nfs:nfs_file_read+170} <ffffffff8027d490> {do_sync_read+208} <ffffffff802424e0>{autoremove_wake_function+0} <ffffffff80227a00> {default_wake_function+0} <ffffffff8027d58c>{vfs_read+188} <ffffffff8027d913>{sys_read+83} <ffffffff80209c06>{system_call+126} Code: 83 38 01 78 f9 f0 ff 08 0f 88 ed ff ff ff c3 90 90 90 90 90 console shuts up ... NMI Watchdog detected LOCKUP on CPU 0 <and so on for all CPUs>
Have you tried 2.6.18-rc1? That contains a patch that ensures pending writes that would be affected by a truncate() call are cancelled.
*** Bug 6852 has been marked as a duplicate of this bug. ***
*** Bug 6853 has been marked as a duplicate of this bug. ***
Just tried 2.6.18-rc2, same problem, although the bug surfaces at a different location. Plus my consoles gets swamped with messages about reducing readahead, and there is a dead loop problem on eth0 (broadcom tg3, can't find it anywhere in the logs but it was visible on the console), but those don't seem to affect normal operation. Kernel BUG at fs/nfs/pagelist.c:87 invalid opcode: 0000 [1] SMP CPU 6 Modules linked in: nfs netconsole sch_sfq cls_u32 sch_tbf sch_prio iptable_filter ip_tables x_tables nfsd exportfs lockd 8250 serial_core ipv6 parport_pc lp parport autofs4 sunrpc w83627hf_wdt binfmt_misc xfs dm_mod video button battery ac ohci1394 ieee1394 ohci_hcd ehci_hcd i2c_nforce2 i2c_core tg3 floppy ide_cd [<ffffffff8820e6f8>] :nfs:nfs_readpage+0x158/0x4a0 [<ffffffff8025e09f>] do_generic_mapping_read+0x26f/0x580 [<ffffffff8025e3b0>] file_read_actor+0x0/0x110 [<ffffffff80260299>] __generic_file_aio_read+0x1a9/0x200 [<ffffffff802604a4>] generic_file_aio_read+0x34/0x40 [<ffffffff88206990>] :nfs:nfs_file_llseek+0x0/0x70 Dead loop on netdevice eth0, fix it urgently! [<ffffffff8028548c>] vfs_read+0xbc/0x170 [<ffffffff80285813>] sys_read+0x53/0x90 [<ffffffff80209c26>] system_call+0x7e/0x83 Code: 0f 0b 68 10 68 21 88 c2 57
OK, that 'debugging code' is definitely a bug. Could you remove the line BUG_ON(PagePrivate(page)); on line 87 of fs/nfs/pagelist.c and then try again?
Removed the line, no change - crashed hard, this time in truncate.c. Again on a PagePrivate(page) line, but you probably saw that already. Every crash is related to NFS-writes, I've tried the same program reading NFS but writing local disk - disk was unhappy with 16 threads hammering it, but kernel did not crash. ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at mm/truncate.c:76 invalid opcode: 0000 [1] SMP CPU 1 Modules linked in: nfs netconsole sch_sfq cls_u32 sch_tbf sch_prio iptable_filter ip_tables x_tables nfsd exportfs lockd 8250 serial_core ipv6 parport_pc lp parport autofs4 sunrpc w83627hf_wdt binfmt_misc xfs dm_mod video button battery ac ohci1394 ieee1394 ohci_hcd ehci_hcd i2c_nforce2 i2c_core tg3 floppy ide_cd cdromPid: 6542, comm: dipfilter.x Not tainted 2.6.18-rc2 #3 RIP: 0010:[<ffffffff802653a6>] [<ffffffff802653a6>] invalidate_complete_page+0x56/0xa0 RSP: 0018:ffff8101f7d81ca8 EFLAGS: 00010002 RAX: 0000000000000825 RBX: ffff8109ffcff998 RCX: ffff8103f8574b08 RDX: 0000000000000000 RSI: 0000000000000007 RDI: ffff8103f8574d88 RBP: ffff8103f8574d70 R08: 0000000000040000 R09: 0000000000040000 R10: ffff810bf4552930 R11: 0000000000000003 R12: 0000000000000000 R13: 0000000000000002 R14: ffff8103f8574d70 R15: 000000000001cfac FS: 00002ad29e1eedc0(0000) GS:ffff810400091340(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00002b1bc142e000 CR3: 00000005f77f1000 CR4: 00000000000006e0 Process dipfilter.x (pid: 6542, threadinfo ffff8101f7d80000, task ffff8101f22d9180) Stack: 000000000001cfab ffff8109ffcff998 0000000000000000 ffffffff802658c0 ffffffffffffffff 000000008820fb50 0000000000000000 ffffffffffffffff 0000000000000009 0000000000000000 ffff810dffd11ba8 ffff810dffcd8b40 Call Trace: [<ffffffff802658c0>] invalidate_inode_pages2_range+0x140/0x210 [<ffffffff882085f9>] :nfs:nfs_revalidate_mapping+0xb9/0x150 [<ffffffff88206b61>] :nfs:nfs_file_read+0x71/0xe0 [<ffffffff80285390>] do_sync_read+0xd0/0x110 [<ffffffff80244450>] autoremove_wake_function+0x0/0x30 [<ffffffff8045faef>] thread_return+0x0/0xf1 [<ffffffff8028548c>] vfs_read+0xbc/0x170 [<ffffffff80285813>] sys_read+0x53/0x90 [<ffffffff80209c26>] system_call+0x7e/0x83 Code: 0f 0b 68 e6 e1 49 80 c2 4c 00 48 89 df e8 28 7c ff ff f0 81 RIP [<ffffffff802653a6>] invalidate_complete_page+0x56/0xa0 RSP <ffff8101f7d81ca8> NMI Watchdog detected LOCKUP on CPU 0 CPU 0 Modules linked in: nfs netconsole sch_sfq cls_u32 sch_tbf sch_prio iptable_filter ip_tables x_tables nfsd exportfs lockd 8250 serial_core ipv6 parport_pc lp parport autofs4 sunrpc w83627hf_wdt binfmt_misc xfs dm_mod video button battery ac ohci1394 ieee1394 ohci_hcd ehci_hcd i2c_nforce2 i2c_core tg3 floppy ide_cd cdromPid: 5496, comm: rpciod/0 Not tainted 2.6.18-rc2 #3 RIP: 0010:[<ffffffff8031403d>] [<ffffffff8031403d>] __write_lock_failed+0x9/0x20 RSP: 0018:ffff8103f7cb9d78 EFLAGS: 00000087 RAX: ffff8103f8574d88 RBX: ffff8109ffd3db60 RCX: 0000000000000282 RDX: ffff8109f7d74e40 RSI: 0000000000000286 RDI: ffff8103f8574d88 RBP: ffff8109ffd3db60 R08: ffff810400396440 R09: 0000000000000000 R10: 0000000000000000 R11: ffffffff80219210 R12: ffff8103f8574d70 R13: ffff8103f8574d88 R14: ffff810bf7e73d58 R15: ffffffff88126b00 FS: 00002b563f72a7e0(0000) GS:ffffffff805b1000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000003d2f530eb0 CR3: 0000000bf4c31000 CR4: 00000000000006e0 Process rpciod/0 (pid: 5496, threadinfo ffff8103f7cb8000, task ffff8101f8f00080)Stack: ffffffff80461dad ffffffff80263b5b ffff8109ffd3db60 ffff8109ffd3db60 ffff810bf7e73b00 ffff810bf7e73b08 ffff810bf7e73d58 ffffffff8025d8c0 ffff8109f84c0740 ffffffff88210c6b ffff810bf7e73b08 0000000000000000 Call Trace: [<ffffffff80461dad>] .text.lock.spinlock+0x67/0x8a [<ffffffff80263b5b>] test_clear_page_writeback+0x4b/0xd0 [<ffffffff8025d8c0>] end_page_writeback+0x20/0x50 [<ffffffff88210c6b>] :nfs:nfs_writeback_done_full+0xfb/0x180 [<ffffffff881263d5>] :sunrpc:rpc_exit_task+0x25/0x80 [<ffffffff881269b3>] :sunrpc:__rpc_execute+0xa3/0x1f0 [<ffffffff802406d0>] run_workqueue+0xb0/0x100 [<ffffffff80243e60>] keventd_create_kthread+0x0/0x80 [<ffffffff8024086a>] worker_thread+0x14a/0x190 [<ffffffff80228f50>] default_wake_function+0x0/0x10 [<ffffffff80228f50>] default_wake_function+0x0/0x10 [<ffffffff80240720>] worker_thread+0x0/0x190 [<ffffffff80243e19>] kthread+0xd9/0x120 [<ffffffff8020aaae>] child_rip+0x8/0x12 [<ffffffff80243e60>] keventd_create_kthread+0x0/0x80 [<ffffffff80243d40>] kthread+0x0/0x120 [<ffffffff8020aaa6>] child_rip+0x0/0x12 Code: 81 38 00 00 00 01 75 f6 f0 81 28 00 00 00 01 0f 85 e2 ff ff console shuts up ... <..> and so on for every CPU in the box.
Created attachment 8769 [details] Patch for NFS-instability on 2.6.15.x
And it keeps happening on -rc3 and -rc4... I've attached the patch that made the 2.6.15-series work alright, but I'm not sure whether it can be applied to later kernels - a lot of it seems already in-place.
Created attachment 8773 [details] Fix a potential deadlock in nfs_release_page From: Nikita Danilov <nikita@clusterfs.com> nfs_wb_page() waits on request completion and, as a result, is not safe to be called from nfs_release_page() invoked by VM scanner as part of GFP_NOFS allocation. Fix possible deadlock by analyzing gfp mask and refusing to release page if __GFP_FS is not set. Signed-off-by: Nikita Danilov <danilov@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
System has been stable for 9 days now, I think the bug can be closed!