Bug 7457 - nfs-kernel-server does not start
Summary: nfs-kernel-server does not start
Status: CLOSED PATCH_ALREADY_AVAILABLE
Alias: None
Product: File System
Classification: Unclassified
Component: NFS (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: Neil Brown
URL:
Keywords:
: 7459 (view as bug list)
Depends on:
Blocks:
 
Reported: 2006-11-04 04:19 UTC by Elimar Riesebieter
Modified: 2006-11-05 14:50 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.19-rc4
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
net/sunrpc/ diff between 2.6.19-rc3 and 2.6.19-rc4 (1.91 KB, patch)
2006-11-04 22:07 UTC, Adrian Bunk
Details | Diff
net/sunrpc/ diff between 2.6.19-rc3 and 2.6.19-rc4 (1.86 KB, patch)
2006-11-04 22:09 UTC, Adrian Bunk
Details | Diff

Description Elimar Riesebieter 2006-11-04 04:19:57 UTC
Most recent kernel where this bug did not occur: 2.6.19-rc3
Distribution: Debian
Hardware Environment: I686
Software Environment: nfs-kernel-server 1.0.10
Problem Description: The server doesn't start
Nov  4 12:46:00 frodo kernel: nfsd: last server has exited
Nov  4 12:46:00 frodo kernel: nfsd: unexporting all filesystems
Nov  4 12:46:01 frodo nfsd[3846]: nfssvc: writting fds to kernel failed: errno 0
(Success)
Nov  4 12:46:01 frodo nfsd[3846]: nfssvc: writting fds to kernel failed: errno 0
(Success)
Nov  4 12:46:01 frodo kernel: BUG: scheduling while atomic:
nfs-kernel-serv/0x00000001/3836
Nov  4 12:46:01 frodo kernel:  [<c03f4c7e>] schedule+0x4be/0x680
Nov  4 12:46:01 frodo kernel:  [<c01174d9>] __wake_up_common+0x39/0x60
Nov  4 12:46:01 frodo kernel:  [<c03f4f4e>] io_schedule+0xe/0x20
Nov  4 12:46:01 frodo kernel:  [<c017fdce>] sync_buffer+0x2e/0x40
Nov  4 12:46:01 frodo kernel:  [<c03f5925>] __wait_on_bit+0x45/0x80
Nov  4 12:46:01 frodo kernel:  [<c017fda0>] sync_buffer+0x0/0x40
Nov  4 12:46:01 frodo kernel:  [<c017fda0>] sync_buffer+0x0/0x40
Nov  4 12:46:01 frodo kernel:  [<c03f59b5>] out_of_line_wait_on_bit+0x55/0x60
Nov  4 12:46:01 frodo kernel:  [<c012f240>] wake_bit_function+0x0/0x40
Nov  4 12:46:01 frodo kernel:  [<c017fd44>] __wait_on_buffer+0x24/0x40
Nov  4 12:46:01 frodo kernel:  [<c0180f07>] __bread+0x67/0xa0
Nov  4 12:46:01 frodo kernel:  [<c019c06b>] read_block_bitmap+0x2b/0x80
Nov  4 12:46:01 frodo kernel:  [<c019d4c9>] ext3_new_blocks+0x2c9/0x6c0
Nov  4 12:46:01 frodo kernel:  [<c03f4a80>] schedule+0x2c0/0x680
Nov  4 12:46:01 frodo kernel:  [<c01a07cf>] ext3_get_blocks_handle+0x28f/0xac0
Nov  4 12:46:01 frodo kernel:  [<c03f4e7d>] preempt_schedule+0x3d/0x60
Nov  4 12:46:01 frodo kernel:  [<c0383c9a>] dev_queue_xmit+0x9a/0x320
Nov  4 12:46:01 frodo kernel:  [<c017f25d>] __find_get_block_slow+0x5d/0x140
Nov  4 12:46:01 frodo kernel:  [<c03dc8db>] packet_rcv_spkt+0xbb/0x140
Nov  4 12:46:01 frodo kernel:  [<c01a1305>] ext3_get_block+0x65/0xe0
Nov  4 12:46:01 frodo kernel:  [<c018024a>] __block_prepare_write+0x22a/0x480
Nov  4 12:46:01 frodo kernel:  [<c01af355>] start_this_handle+0x75/0x480
Nov  4 12:46:01 frodo kernel:  [<c01af828>] journal_start+0xc8/0x100
Nov  4 12:46:01 frodo kernel:  [<c01804c2>] block_prepare_write+0x22/0x40
Nov  4 12:46:01 frodo kernel:  [<c01a12a0>] ext3_get_block+0x0/0xe0
Nov  4 12:46:01 frodo kernel:  [<c01a2900>] ext3_prepare_write+0x40/0x180
Nov  4 12:46:01 frodo kernel:  [<c01a12a0>] ext3_get_block+0x0/0xe0
Nov  4 12:46:01 frodo kernel:  [<c01a28c0>] ext3_prepare_write+0x0/0x180
Nov  4 12:46:01 frodo kernel:  [<c013ffd3>] generic_file_buffered_write+0x2b3/0x700
Nov  4 12:46:01 frodo kernel:  [<c01ae73d>] __journal_file_buffer+0x9d/0x260
Nov  4 12:46:01 frodo kernel:  [<c0120a63>] current_fs_time+0x43/0x60
Nov  4 12:46:01 frodo kernel:  [<c01406b8>]
__generic_file_aio_write_nolock+0x298/0x5c0
Nov  4 12:46:01 frodo kernel:  [<c01a0160>] ext3_mark_inode_dirty+0x20/0x40
Nov  4 12:46:01 frodo kernel:  [<c017f61a>] __find_get_block+0x7a/0x1c0
Nov  4 12:46:01 frodo kernel:  [<c0140a32>] generic_file_aio_write+0x52/0xe0
Nov  4 12:46:01 frodo kernel:  [<c019e524>] ext3_file_write+0x24/0xa0
Nov  4 12:46:01 frodo kernel:  [<c015dc55>] do_sync_write+0xd5/0x120
Nov  4 12:46:01 frodo kernel:  [<c012f200>] autoremove_wake_function+0x0/0x40
Nov  4 12:46:01 frodo kernel:  [<c01a82d9>] __ext3_journal_stop+0x19/0x40
Nov  4 12:46:01 frodo kernel:  [<c017bac8>] __mark_inode_dirty+0x28/0x1a0
Nov  4 12:46:01 frodo kernel:  [<c01a82d9>] __ext3_journal_stop+0x19/0x40
Nov  4 12:46:0Nov  4 12:46:01 frodo kernel:  [<c0102718>]
do_notify_resume+0x78/0x6e0
Nov  4 12:46:01 frodo kernel:  [<c0116164>] force_sig_info_fault+0x24/0x40
Nov  4 12:46:01 frodo kernel:  [<c0190380>] proc_delete_inode+0x0/0x80
Nov  4 12:46:01 frodo kernel:  [<c0171ce0>] destroy_inode+0x20/0x40
Nov  4 12:46:01 frodo kernel:  [<c0171911>] iput+0x51/0x80
Nov  4 12:46:01 frodo kernel:  [<c0116583>] do_page_fault+0x363/0x5c0
Nov  4 12:46:01 frodo kernel:  [<c0116220>] do_page_fault+0x0/0x5c0
Nov  4 12:46:01 frodo kernel:  [<c01031ce>] work_notifysig+0x13/0x25
Nov  4 12:46:01 frodo kernel:  =======================
Nov  4 12:46:01 frodo kernel: note: nfs-kernel-serv[3836] exited with
preempt_count 1

Steps to reproduce:
Restarting nfs-kernel-server by hand produces a core dump
Comment 1 Trond Myklebust 2006-11-04 05:31:34 UTC
Hmm... There are several problems here.

The first 4 lines there appear to be from nfs-utils (the 'writting' errors) and
would explain why nfsd is failing to start.

That BUG, though, would appear to be an ext3 problem. I suggest you file that as
a separate ext3 bug in this bugzilla.

Cced: Neil for the nfsd problem...
Comment 2 Andrew Morton 2006-11-04 11:46:41 UTC
I'd expect that ext3 is just collateral damage here.  Some piece of code
got the preempt_count wrong (something like a missed unlock on an error
path) and then the next thing which tried to schedule detected it.
Comment 3 Andrew Morton 2006-11-04 11:47:34 UTC
*** Bug 7459 has been marked as a duplicate of this bug. ***
Comment 4 Adrian Bunk 2006-11-04 22:07:54 UTC
Created attachment 9405 [details]
net/sunrpc/ diff between 2.6.19-rc3 and 2.6.19-rc4

Elimar, if I understand you correctly, this issue is not present in 2.6.19-rc3,
but is reproducible presend in 2.6.19-rc4?

There were no ext3 changes between -rc3 and -rc4, and the only NFS related
changes were  Neil's "sunrpc: fix refcounting problems in rpc servers" (plus a
later minor cleanup by Andrew for this patch).

If you take 2.6.19-rc4 and revert this attached patch, does it fix the issue?
Comment 5 Adrian Bunk 2006-11-04 22:09:23 UTC
Created attachment 9406 [details]
net/sunrpc/ diff between 2.6.19-rc3 and 2.6.19-rc4

The right attachment...
Comment 6 Elimar Riesebieter 2006-11-05 01:31:41 UTC
Hi,
reverting the patch works:

Nov  5 10:24:19 frodo nfsd[3173]: nfssvc: writting fds to kernel failed: errno 0
(Success)
Nov  5 10:24:19 frodo nfsd[3173]: nfssvc: writting fds to kernel failed: errno 0
(Success)
...
Nov  5 10:25:14 frodo mountd[3185]: authenticated mount request from
aragorn.home.lxtec.de:629 for /source (/source)

Elimar
Comment 7 Neil Brown 2006-11-05 14:11:44 UTC
The real problem (which reverting the patch hides rather than fixes) is
fixed by

http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.19-rc4/2.6.19-rc4-mm2/broken-out/sunrpc-add-missing-spin_unlock.patch

I had thought that would go in the same time as the other patch (which exposed
the bug).
Comment 8 Elimar Riesebieter 2006-11-05 14:50:38 UTC
http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.19-rc4/2.6.19-rc4-mm2/broken-out/sunrpc-add-missing-spin_unlock.patch

has fixed the bug. Will close then.

Thanks Elimmar

Note You need to log in before you can comment on or make changes to this bug.