Created attachment 24918 [details] .config of oopsing kernel I don't know what triggers it, but twice in the last day I've had a panic in the FScache code on 2.6.32.7. Fascinating, because cachefilesd isn't running, and there are no caches extant in its config file anyway. Nevertheless, I get the following Oops, and various NFS-using processes go into D forever: ------------[ cut here ]------------ kernel BUG at fs/nfs/fscache.c:360! invalid opcode: 0000 [#1] PREEMPT SMP last sysfs file: /sys/devices/pci0000:00/0000:00:1f.2/host3/target3:0:0/3:0:0:0/block/sdd/size CPU 0 Modules linked in: autofs4 coretemp hwmon nfs lockd nfs_acl auth_rpcgss sunrpc cachefiles fscache ipv6 cpufreq_ondemand acpi_cpufreq freq_table kvm_intel kvm snd_hda_intel snd_hda_codec snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_usb_audio snd_usb_lib snd_mixer_oss snd_pcm snd_rawmidi snd_seq_device snd_timer e1000e snd_hwdep snd uvcvideo videodev usb_storage v4l1_compat firewire_ohci snd_page_alloc firewire_core ppdev v4l2_compat_ioctl32 iTCO_wdt iTCO_vendor_support soundcore i82975x_edac i2c_i801 pcspkr i2c_core edac_core crc_itu_t parport_pc parport raid1 [last unloaded: scsi_wait_scan] Pid: 14386, comm: php-cgi Not tainted 2.6.32.7-fun #1 RIP: 0010:[<ffffffffa0354f63>] [<ffffffffa0354f63>] nfs_fscache_release_page+0x2f/0x9e [nfs] RSP: 0018:ffff88006a3252d8 EFLAGS: 00210246 RAX: ffff880063d0e928 RBX: ffffea00039567d8 RCX: ffff880063d0e710 RDX: 0020000000000009 RSI: 00000000000852d0 RDI: ffffea00039567d8 RBP: ffff88006a3252f8 R08: ffff88006a325280 R09: 0000000000000012 R10: ffff880028214258 R11: 000000000000ead0 R12: 0000000000000000 R13: 00000000000852d0 R14: ffff88006a325578 R15: 0000000000000001 FS: 0000000000000000(0000) GS:ffff880028200000(0063) knlGS:00000000f777d730 CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b CR2: 00000000f74cf000 CR3: 0000000043e81000 CR4: 00000000000026e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400 Process php-cgi (pid: 14386, threadinfo ffff88006a324000, task ffff8800af802040) Stack: 0000000000000020 ffffea00039567d8 00000000000852d0 ffff88006a325758 <0> ffff88006a325318 ffffffffa0331000 ffffea0003956820 ffffea00039567d8 <0> ffff88006a325328 ffffffff810cc268 ffff88006a325428 ffffffff810d83c3 Call Trace: [<ffffffffa0331000>] nfs_release_page+0x41/0x46 [nfs] [<ffffffff810cc268>] try_to_release_page+0x37/0x40 [<ffffffff810d83c3>] shrink_page_list+0x2d8/0x44c [<ffffffff81431a05>] ? _spin_unlock_irqrestore+0x70/0x7e [<ffffffff8106cd68>] ? finish_wait+0x66/0x6e [<ffffffff810e3664>] ? congestion_wait+0x80/0x8f [<ffffffff810d897e>] shrink_inactive_list+0x447/0x655 [<ffffffff8121dffa>] ? __up_read+0x1a/0x7f [<ffffffff810d460c>] ? determine_dirtyable_memory+0x1a/0x2d [<ffffffff810d4698>] ? get_dirty_limits+0x27/0x25e [<ffffffff810d8ec8>] shrink_list+0xb7/0xc2 [<ffffffff810d9196>] shrink_zone+0x2c3/0x361 [<ffffffff810d947f>] ? shrink_slab+0x14f/0x161 [<ffffffff810da373>] do_try_to_free_pages+0x1e9/0x352 [<ffffffff810da5d6>] try_to_free_pages+0x6e/0x70 [<ffffffff810d7c62>] ? isolate_pages_global+0x0/0x22a [<ffffffff8142f852>] ? _cond_resched+0xe/0x22 [<ffffffff810d3828>] __alloc_pages_nodemask+0x40f/0x66f [<ffffffff810fb943>] alloc_pages_current+0x95/0x9e [<ffffffff811030cf>] alloc_slab_page+0x1b/0x28 [<ffffffff81103133>] new_slab+0x57/0x1dc [<ffffffff811038c6>] __slab_alloc+0x266/0x43d [<ffffffffa033280d>] ? nfs_alloc_inode+0x1a/0x64 [nfs] [<ffffffffa033280d>] ? nfs_alloc_inode+0x1a/0x64 [nfs] [<ffffffff81103d30>] kmem_cache_alloc+0x9e/0x156 [<ffffffffa033446a>] ? nfs_find_actor+0x0/0x73 [nfs] [<ffffffffa033280d>] nfs_alloc_inode+0x1a/0x64 [nfs] [<ffffffff81122542>] alloc_inode+0x1d/0x7b [<ffffffff81123246>] iget5_locked+0x63/0x11d [<ffffffffa0332857>] ? nfs_init_locked+0x0/0x3c [nfs] [<ffffffffa0333f93>] nfs_fhget+0x55/0x52c [nfs] [<ffffffffa032f79c>] nfs_lookup+0x124/0x18d [nfs] [<ffffffff81121304>] ? d_alloc+0x181/0x1b8 [<ffffffff81121304>] ? d_alloc+0x181/0x1b8 [<ffffffff81434448>] ? sub_preempt_count+0x9/0x83 [<ffffffff81431987>] ? _spin_unlock+0x4a/0x58 [<ffffffff81117cd7>] do_lookup+0xd1/0x167 [<ffffffff811186a9>] __link_path_walk+0x591/0x6c2 [<ffffffff8111897a>] path_walk+0x4c/0x8f [<ffffffff81221880>] ? strncpy_from_user+0x3d/0x41 [<ffffffff81119ec5>] do_path_lookup+0x2a/0x8b [<ffffffff8111b4fe>] user_path_at+0x56/0x93 [<ffffffff81125d73>] ? mntput_no_expire+0x29/0xfa [<ffffffff81117968>] ? mntput+0x1d/0x1f [<ffffffff8111331d>] vfs_fstatat+0x37/0x62 [<ffffffff811133a3>] vfs_lstat+0x1e/0x20 [<ffffffff81038936>] sys32_lstat64+0x1f/0x39 [<ffffffff81117a91>] ? path_put+0x22/0x26 [<ffffffff814310f2>] ? trace_hardirqs_off_thunk+0x3a/0x3c [<ffffffff8103744f>] sysenter_dispatch+0x7/0x33 [<ffffffff814310b3>] ? trace_hardirqs_on_thunk+0x3a/0x3f Code: 41 55 41 54 53 48 83 ec 08 0f 1f 44 00 00 48 8b 47 18 48 89 fb 48 8b 00 41 89 f5 4c 8b 60 f8 48 8d 88 e8 fd ff ff 4d 85 e4 75 04 <0f> 0b eb fe 48 8b 17 b8 01 00 00 00 80 e6 10 74 56 f6 05 62 e3 RIP [<ffffffffa0354f63>] nfs_fscache_release_page+0x2f/0x9e [nfs] RSP <ffff88006a3252d8> ---[ end trace 3cc500df7f6b03e1 ]---
Created attachment 24919 [details] entire dmesg of affected machine
That BUG_ON looks bogus to me.
The BUG_ON I'm not so worried about, unless your intent is to claim it's corrupting other state. The NFS processes I'm seeing hang, are what I'm worried about.
Ah, I misparsed you as referring to the disabling lock validator lines above the fscache printout. Nevermind.
Created attachment 24920 [details] NFS: Fix a bug in nfs_fscache_release_page() Not having an fscache cookie is perfectly valid if the user didn't mount with the fscache option.
Trond's patch looks good.
Confirming bug goes away with patch applied.