Most recent kernel where this bug did not occur: 2.6.19-rc3 Distribution: Debian Hardware Environment: I686 Software Environment: nfs-kernel-server 1.0.10 Problem Description: The server doesn't start Nov 4 12:46:00 frodo kernel: nfsd: last server has exited Nov 4 12:46:00 frodo kernel: nfsd: unexporting all filesystems Nov 4 12:46:01 frodo nfsd[3846]: nfssvc: writting fds to kernel failed: errno 0 (Success) Nov 4 12:46:01 frodo nfsd[3846]: nfssvc: writting fds to kernel failed: errno 0 (Success) Nov 4 12:46:01 frodo kernel: BUG: scheduling while atomic: nfs-kernel-serv/0x00000001/3836 Nov 4 12:46:01 frodo kernel: [<c03f4c7e>] schedule+0x4be/0x680 Nov 4 12:46:01 frodo kernel: [<c01174d9>] __wake_up_common+0x39/0x60 Nov 4 12:46:01 frodo kernel: [<c03f4f4e>] io_schedule+0xe/0x20 Nov 4 12:46:01 frodo kernel: [<c017fdce>] sync_buffer+0x2e/0x40 Nov 4 12:46:01 frodo kernel: [<c03f5925>] __wait_on_bit+0x45/0x80 Nov 4 12:46:01 frodo kernel: [<c017fda0>] sync_buffer+0x0/0x40 Nov 4 12:46:01 frodo kernel: [<c017fda0>] sync_buffer+0x0/0x40 Nov 4 12:46:01 frodo kernel: [<c03f59b5>] out_of_line_wait_on_bit+0x55/0x60 Nov 4 12:46:01 frodo kernel: [<c012f240>] wake_bit_function+0x0/0x40 Nov 4 12:46:01 frodo kernel: [<c017fd44>] __wait_on_buffer+0x24/0x40 Nov 4 12:46:01 frodo kernel: [<c0180f07>] __bread+0x67/0xa0 Nov 4 12:46:01 frodo kernel: [<c019c06b>] read_block_bitmap+0x2b/0x80 Nov 4 12:46:01 frodo kernel: [<c019d4c9>] ext3_new_blocks+0x2c9/0x6c0 Nov 4 12:46:01 frodo kernel: [<c03f4a80>] schedule+0x2c0/0x680 Nov 4 12:46:01 frodo kernel: [<c01a07cf>] ext3_get_blocks_handle+0x28f/0xac0 Nov 4 12:46:01 frodo kernel: [<c03f4e7d>] preempt_schedule+0x3d/0x60 Nov 4 12:46:01 frodo kernel: [<c0383c9a>] dev_queue_xmit+0x9a/0x320 Nov 4 12:46:01 frodo kernel: [<c017f25d>] __find_get_block_slow+0x5d/0x140 Nov 4 12:46:01 frodo kernel: [<c03dc8db>] packet_rcv_spkt+0xbb/0x140 Nov 4 12:46:01 frodo kernel: [<c01a1305>] ext3_get_block+0x65/0xe0 Nov 4 12:46:01 frodo kernel: [<c018024a>] __block_prepare_write+0x22a/0x480 Nov 4 12:46:01 frodo kernel: [<c01af355>] start_this_handle+0x75/0x480 Nov 4 12:46:01 frodo kernel: [<c01af828>] journal_start+0xc8/0x100 Nov 4 12:46:01 frodo kernel: [<c01804c2>] block_prepare_write+0x22/0x40 Nov 4 12:46:01 frodo kernel: [<c01a12a0>] ext3_get_block+0x0/0xe0 Nov 4 12:46:01 frodo kernel: [<c01a2900>] ext3_prepare_write+0x40/0x180 Nov 4 12:46:01 frodo kernel: [<c01a12a0>] ext3_get_block+0x0/0xe0 Nov 4 12:46:01 frodo kernel: [<c01a28c0>] ext3_prepare_write+0x0/0x180 Nov 4 12:46:01 frodo kernel: [<c013ffd3>] generic_file_buffered_write+0x2b3/0x700 Nov 4 12:46:01 frodo kernel: [<c01ae73d>] __journal_file_buffer+0x9d/0x260 Nov 4 12:46:01 frodo kernel: [<c0120a63>] current_fs_time+0x43/0x60 Nov 4 12:46:01 frodo kernel: [<c01406b8>] __generic_file_aio_write_nolock+0x298/0x5c0 Nov 4 12:46:01 frodo kernel: [<c01a0160>] ext3_mark_inode_dirty+0x20/0x40 Nov 4 12:46:01 frodo kernel: [<c017f61a>] __find_get_block+0x7a/0x1c0 Nov 4 12:46:01 frodo kernel: [<c0140a32>] generic_file_aio_write+0x52/0xe0 Nov 4 12:46:01 frodo kernel: [<c019e524>] ext3_file_write+0x24/0xa0 Nov 4 12:46:01 frodo kernel: [<c015dc55>] do_sync_write+0xd5/0x120 Nov 4 12:46:01 frodo kernel: [<c012f200>] autoremove_wake_function+0x0/0x40 Nov 4 12:46:01 frodo kernel: [<c01a82d9>] __ext3_journal_stop+0x19/0x40 Nov 4 12:46:01 frodo kernel: [<c017bac8>] __mark_inode_dirty+0x28/0x1a0 Nov 4 12:46:01 frodo kernel: [<c01a82d9>] __ext3_journal_stop+0x19/0x40 Nov 4 12:46:0Nov 4 12:46:01 frodo kernel: [<c0102718>] do_notify_resume+0x78/0x6e0 Nov 4 12:46:01 frodo kernel: [<c0116164>] force_sig_info_fault+0x24/0x40 Nov 4 12:46:01 frodo kernel: [<c0190380>] proc_delete_inode+0x0/0x80 Nov 4 12:46:01 frodo kernel: [<c0171ce0>] destroy_inode+0x20/0x40 Nov 4 12:46:01 frodo kernel: [<c0171911>] iput+0x51/0x80 Nov 4 12:46:01 frodo kernel: [<c0116583>] do_page_fault+0x363/0x5c0 Nov 4 12:46:01 frodo kernel: [<c0116220>] do_page_fault+0x0/0x5c0 Nov 4 12:46:01 frodo kernel: [<c01031ce>] work_notifysig+0x13/0x25 Nov 4 12:46:01 frodo kernel: ======================= Nov 4 12:46:01 frodo kernel: note: nfs-kernel-serv[3836] exited with preempt_count 1 Steps to reproduce: Restarting nfs-kernel-server by hand produces a core dump
Hmm... There are several problems here. The first 4 lines there appear to be from nfs-utils (the 'writting' errors) and would explain why nfsd is failing to start. That BUG, though, would appear to be an ext3 problem. I suggest you file that as a separate ext3 bug in this bugzilla. Cced: Neil for the nfsd problem...
I'd expect that ext3 is just collateral damage here. Some piece of code got the preempt_count wrong (something like a missed unlock on an error path) and then the next thing which tried to schedule detected it.
*** Bug 7459 has been marked as a duplicate of this bug. ***
Created attachment 9405 [details] net/sunrpc/ diff between 2.6.19-rc3 and 2.6.19-rc4 Elimar, if I understand you correctly, this issue is not present in 2.6.19-rc3, but is reproducible presend in 2.6.19-rc4? There were no ext3 changes between -rc3 and -rc4, and the only NFS related changes were Neil's "sunrpc: fix refcounting problems in rpc servers" (plus a later minor cleanup by Andrew for this patch). If you take 2.6.19-rc4 and revert this attached patch, does it fix the issue?
Created attachment 9406 [details] net/sunrpc/ diff between 2.6.19-rc3 and 2.6.19-rc4 The right attachment...
Hi, reverting the patch works: Nov 5 10:24:19 frodo nfsd[3173]: nfssvc: writting fds to kernel failed: errno 0 (Success) Nov 5 10:24:19 frodo nfsd[3173]: nfssvc: writting fds to kernel failed: errno 0 (Success) ... Nov 5 10:25:14 frodo mountd[3185]: authenticated mount request from aragorn.home.lxtec.de:629 for /source (/source) Elimar
The real problem (which reverting the patch hides rather than fixes) is fixed by http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.19-rc4/2.6.19-rc4-mm2/broken-out/sunrpc-add-missing-spin_unlock.patch I had thought that would go in the same time as the other patch (which exposed the bug).
http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.19-rc4/2.6.19-rc4-mm2/broken-out/sunrpc-add-missing-spin_unlock.patch has fixed the bug. Will close then. Thanks Elimmar