Bug 15234 - recurring fscache oops on 2.6.32
Summary: recurring fscache oops on 2.6.32
Status: CLOSED CODE_FIX
Alias: None
Product: File System
Classification: Unclassified
Component: NFS (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Trond Myklebust
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-02-05 15:15 UTC by Rich Ercolani
Modified: 2011-03-21 19:46 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.32.7
Subsystem:
Regression: No
Bisected commit-id:


Attachments
.config of oopsing kernel (99.84 KB, application/octet-stream)
2010-02-05 15:15 UTC, Rich Ercolani
Details
entire dmesg of affected machine (58.29 KB, text/x-log)
2010-02-05 15:17 UTC, Rich Ercolani
Details
NFS: Fix a bug in nfs_fscache_release_page() (935 bytes, patch)
2010-02-05 15:28 UTC, Trond Myklebust
Details | Diff

Description Rich Ercolani 2010-02-05 15:15:08 UTC
Created attachment 24918 [details]
.config of oopsing kernel

I don't know what triggers it, but twice in the last day I've had a panic in the FScache code on 2.6.32.7.

Fascinating, because cachefilesd isn't running, and there are no caches extant in its config file anyway.

Nevertheless, I get the following Oops, and various NFS-using processes go into D forever:

------------[ cut here ]------------
kernel BUG at fs/nfs/fscache.c:360!
invalid opcode: 0000 [#1] PREEMPT SMP 
last sysfs file: /sys/devices/pci0000:00/0000:00:1f.2/host3/target3:0:0/3:0:0:0/block/sdd/size
CPU 0 
Modules linked in: autofs4 coretemp hwmon nfs lockd nfs_acl auth_rpcgss sunrpc cachefiles fscache ipv6 cpufreq_ondemand acpi_cpufreq freq_table kvm_intel kvm snd_hda_intel snd_hda_codec snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_usb_audio snd_usb_lib snd_mixer_oss snd_pcm snd_rawmidi snd_seq_device snd_timer e1000e snd_hwdep snd uvcvideo videodev usb_storage v4l1_compat firewire_ohci snd_page_alloc firewire_core ppdev v4l2_compat_ioctl32 iTCO_wdt iTCO_vendor_support soundcore i82975x_edac i2c_i801 pcspkr i2c_core edac_core crc_itu_t parport_pc parport raid1 [last unloaded: scsi_wait_scan]
Pid: 14386, comm: php-cgi Not tainted 2.6.32.7-fun #1         
RIP: 0010:[<ffffffffa0354f63>]  [<ffffffffa0354f63>] nfs_fscache_release_page+0x2f/0x9e [nfs]
RSP: 0018:ffff88006a3252d8  EFLAGS: 00210246
RAX: ffff880063d0e928 RBX: ffffea00039567d8 RCX: ffff880063d0e710
RDX: 0020000000000009 RSI: 00000000000852d0 RDI: ffffea00039567d8
RBP: ffff88006a3252f8 R08: ffff88006a325280 R09: 0000000000000012
R10: ffff880028214258 R11: 000000000000ead0 R12: 0000000000000000
R13: 00000000000852d0 R14: ffff88006a325578 R15: 0000000000000001
FS:  0000000000000000(0000) GS:ffff880028200000(0063) knlGS:00000000f777d730
CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
CR2: 00000000f74cf000 CR3: 0000000043e81000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400
Process php-cgi (pid: 14386, threadinfo ffff88006a324000, task ffff8800af802040)
Stack:
 0000000000000020 ffffea00039567d8 00000000000852d0 ffff88006a325758
<0> ffff88006a325318 ffffffffa0331000 ffffea0003956820 ffffea00039567d8
<0> ffff88006a325328 ffffffff810cc268 ffff88006a325428 ffffffff810d83c3
Call Trace:
 [<ffffffffa0331000>] nfs_release_page+0x41/0x46 [nfs]
 [<ffffffff810cc268>] try_to_release_page+0x37/0x40
 [<ffffffff810d83c3>] shrink_page_list+0x2d8/0x44c
 [<ffffffff81431a05>] ? _spin_unlock_irqrestore+0x70/0x7e
 [<ffffffff8106cd68>] ? finish_wait+0x66/0x6e
 [<ffffffff810e3664>] ? congestion_wait+0x80/0x8f
 [<ffffffff810d897e>] shrink_inactive_list+0x447/0x655
 [<ffffffff8121dffa>] ? __up_read+0x1a/0x7f
 [<ffffffff810d460c>] ? determine_dirtyable_memory+0x1a/0x2d
 [<ffffffff810d4698>] ? get_dirty_limits+0x27/0x25e
 [<ffffffff810d8ec8>] shrink_list+0xb7/0xc2
 [<ffffffff810d9196>] shrink_zone+0x2c3/0x361
 [<ffffffff810d947f>] ? shrink_slab+0x14f/0x161
 [<ffffffff810da373>] do_try_to_free_pages+0x1e9/0x352
 [<ffffffff810da5d6>] try_to_free_pages+0x6e/0x70
 [<ffffffff810d7c62>] ? isolate_pages_global+0x0/0x22a
 [<ffffffff8142f852>] ? _cond_resched+0xe/0x22
 [<ffffffff810d3828>] __alloc_pages_nodemask+0x40f/0x66f
 [<ffffffff810fb943>] alloc_pages_current+0x95/0x9e
 [<ffffffff811030cf>] alloc_slab_page+0x1b/0x28
 [<ffffffff81103133>] new_slab+0x57/0x1dc
 [<ffffffff811038c6>] __slab_alloc+0x266/0x43d
 [<ffffffffa033280d>] ? nfs_alloc_inode+0x1a/0x64 [nfs]
 [<ffffffffa033280d>] ? nfs_alloc_inode+0x1a/0x64 [nfs]
 [<ffffffff81103d30>] kmem_cache_alloc+0x9e/0x156
 [<ffffffffa033446a>] ? nfs_find_actor+0x0/0x73 [nfs]
 [<ffffffffa033280d>] nfs_alloc_inode+0x1a/0x64 [nfs]
 [<ffffffff81122542>] alloc_inode+0x1d/0x7b
 [<ffffffff81123246>] iget5_locked+0x63/0x11d
 [<ffffffffa0332857>] ? nfs_init_locked+0x0/0x3c [nfs]
 [<ffffffffa0333f93>] nfs_fhget+0x55/0x52c [nfs]
 [<ffffffffa032f79c>] nfs_lookup+0x124/0x18d [nfs]
 [<ffffffff81121304>] ? d_alloc+0x181/0x1b8
 [<ffffffff81121304>] ? d_alloc+0x181/0x1b8
 [<ffffffff81434448>] ? sub_preempt_count+0x9/0x83
 [<ffffffff81431987>] ? _spin_unlock+0x4a/0x58
 [<ffffffff81117cd7>] do_lookup+0xd1/0x167
 [<ffffffff811186a9>] __link_path_walk+0x591/0x6c2
 [<ffffffff8111897a>] path_walk+0x4c/0x8f
 [<ffffffff81221880>] ? strncpy_from_user+0x3d/0x41
 [<ffffffff81119ec5>] do_path_lookup+0x2a/0x8b
 [<ffffffff8111b4fe>] user_path_at+0x56/0x93
 [<ffffffff81125d73>] ? mntput_no_expire+0x29/0xfa
 [<ffffffff81117968>] ? mntput+0x1d/0x1f
 [<ffffffff8111331d>] vfs_fstatat+0x37/0x62
 [<ffffffff811133a3>] vfs_lstat+0x1e/0x20
 [<ffffffff81038936>] sys32_lstat64+0x1f/0x39
 [<ffffffff81117a91>] ? path_put+0x22/0x26
 [<ffffffff814310f2>] ? trace_hardirqs_off_thunk+0x3a/0x3c
 [<ffffffff8103744f>] sysenter_dispatch+0x7/0x33
 [<ffffffff814310b3>] ? trace_hardirqs_on_thunk+0x3a/0x3f
Code: 41 55 41 54 53 48 83 ec 08 0f 1f 44 00 00 48 8b 47 18 48 89 fb 48 8b 00 41 89 f5 4c 8b 60 f8 48 8d 88 e8 fd ff ff 4d 85 e4 75 04 <0f> 0b eb fe 48 8b 17 b8 01 00 00 00 80 e6 10 74 56 f6 05 62 e3 
RIP  [<ffffffffa0354f63>] nfs_fscache_release_page+0x2f/0x9e [nfs]
 RSP <ffff88006a3252d8>
---[ end trace 3cc500df7f6b03e1 ]---
Comment 1 Rich Ercolani 2010-02-05 15:17:15 UTC
Created attachment 24919 [details]
entire dmesg of affected machine
Comment 2 Trond Myklebust 2010-02-05 15:23:32 UTC
That BUG_ON looks bogus to me.
Comment 3 Rich Ercolani 2010-02-05 15:25:52 UTC
The BUG_ON I'm not so worried about, unless your intent is to claim it's corrupting other state.

The NFS processes I'm seeing hang, are what I'm worried about.
Comment 4 Rich Ercolani 2010-02-05 15:26:42 UTC
Ah, I misparsed you as referring to the disabling lock validator lines above the fscache printout. Nevermind.
Comment 5 Trond Myklebust 2010-02-05 15:28:57 UTC
Created attachment 24920 [details]
NFS: Fix a bug in nfs_fscache_release_page()

Not having an fscache cookie is perfectly valid if the user didn't mount
with the fscache option.
Comment 6 David Howells 2010-02-09 14:58:44 UTC
Trond's patch looks good.
Comment 7 Rich Ercolani 2010-02-09 21:46:48 UTC
Confirming bug goes away with patch applied.

Note You need to log in before you can comment on or make changes to this bug.