Most recent kernel where this bug did not occur: 2.6.13 Distribution: Fedora Core 4 Hardware Environment: IBM T42P Software Environment: Vanilla 2.6.15 Problem Description: I have 'gkrellm' application permanently running. One of its tasks is to monitor battery status. This is achieved by reading ACPI /proc files. After some number of suspend/resume cycles, grkellm stalls; System continue to run; eventually it ceases to suspend. /var/log/messages gives: Jan 15 15:42:20 vkondra-mobl kernel: Unable to handle kernel paging request at virtual address 6b6b6b8b Jan 15 15:42:20 vkondra-mobl kernel: printing eip: Jan 15 15:42:20 vkondra-mobl kernel: c01340f3 Jan 15 15:42:20 vkondra-mobl kernel: *pde = 00000000 Jan 15 15:42:20 vkondra-mobl kernel: Oops: 0000 [#1] Jan 15 15:42:20 vkondra-mobl kernel: PREEMPT Jan 15 15:42:20 vkondra-mobl kernel: Modules linked in: nls_cp437 nls_utf8 smbfs nls_base bluetooth cpufreq_stats cpufreq_conserva tive cpufreq_performance cpufreq_powersave cpufreq_ondemand speedstep_centrino freq_table microcode evdev ipv6 lp autofs4 sunrpc d m_mirror dm_mod video thermal processor ibm_acpi fan container button battery ac parport_pc parport snd_intel8x0m snd_seq_dummy sn d_intel8x0 snd_ac97_codec snd_ac97_bus snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss floppy pcmc ia ipw2200 snd_pcm usbhid crc32 ieee80211 ieee80211_crypt snd_timer snd soundcore yenta_socket i2c_i801 e1000 firmware_class snd_p age_alloc rsrc_nonstatic pcmcia_core ehci_hcd uhci_hcd i2c_core intel_agp agpgart hw_random usbcore ext3 jbd Jan 15 15:42:20 vkondra-mobl kernel: CPU: 0 Jan 15 15:42:20 vkondra-mobl kernel: EIP: 0060:[<c01340f3>] Not tainted VLI Jan 15 15:42:20 vkondra-mobl kernel: EFLAGS: 00010286 (2.6.15) Jan 15 15:42:20 vkondra-mobl kernel: EIP is at audit_inode+0x81/0xb2 Jan 15 15:42:20 vkondra-mobl kernel: eax: 00000101 ebx: ee671778 ecx: 00000001 edx: ee671778 Jan 15 15:42:20 vkondra-mobl kernel: esi: 6b6b6b6b edi: d97fb000 ebp: ee279eb8 esp: ee279ea8 Jan 15 15:42:20 vkondra-mobl kernel: ds: 007b es: 007b ss: 0068 Jan 15 15:42:20 vkondra-mobl kernel: Process gkrellm (pid: 2971, threadinfo=ee278000 task=eee82a90) Jan 15 15:42:20 vkondra-mobl kernel: Stack: 00000101 ee278000 ee279f44 d97fb000 ee279ed4 c015cd8b 00000101 fffffffe Jan 15 15:42:20 vkondra-mobl kernel: ee279f44 00000101 ffffffe9 ee279eec c015cdd4 d97fb000 d97fb000 ee279f44 Jan 15 15:42:20 vkondra-mobl kernel: 00000008 ee279efc c015ce10 00008001 00000000 ee279f38 c015d464 00008001 Jan 15 15:42:20 vkondra-mobl kernel: Call Trace: Jan 15 15:42:20 vkondra-mobl kernel: [<c0103bcc>] show_stack+0x7c/0x85 Jan 15 15:42:20 vkondra-mobl kernel: [<c0103d11>] show_registers+0x123/0x197 Jan 15 15:42:20 vkondra-mobl kernel: [<c0103ecb>] die+0xd2/0x145 Jan 15 15:42:20 vkondra-mobl kernel: [<c0292f45>] do_page_fault+0x373/0x521 Jan 15 15:42:20 vkondra-mobl kernel: [<c01038cb>] error_code+0x4f/0x54 Jan 15 15:42:20 vkondra-mobl kernel: [<c015cd8b>] path_lookup+0x179/0x183 Jan 15 15:42:20 vkondra-mobl kernel: [<c015cdd4>] __path_lookup_intent_open+0x3f/0x6e Jan 15 15:42:20 vkondra-mobl kernel: [<c015ce10>] path_lookup_open+0xd/0xf Jan 15 15:42:20 vkondra-mobl kernel: [<c015d464>] open_namei+0x63/0x588 Jan 15 15:42:20 vkondra-mobl kernel: [<c014f156>] filp_open+0x1c/0x31 Jan 15 15:42:20 vkondra-mobl kernel: [<c014f3ee>] do_sys_open+0x37/0xa8 Jan 15 15:42:20 vkondra-mobl kernel: [<c014f470>] sys_open+0x11/0x13 Jan 15 15:42:20 vkondra-mobl kernel: [<c0102d99>] syscall_call+0x7/0xb Jan 15 15:42:20 vkondra-mobl kernel: Code: 74 04 39 f8 74 1a 83 f9 0c 7f 4b 8d 41 01 89 ca 89 43 38 89 c8 c1 e0 05 c7 44 18 3c 00 00 00 00 8b 45 f0 c1 e2 05 01 da 89 42 58 <8b> 46 20 89 42 40 8b 86 c4 00 00 00 8b 40 08 89 42 44 8b 46 28 Steps to reproduce: run 2.6.15; run X11 with 'gkrellm' and battery monitor enabled. Perform several suspend/resume cycles. Monitor /var/log/messages.
Failure caused by garbage (6b6b6b6b in my case) in 'inode' parameter, and 1-st access to 'inode' member leads to page fault. Following is audit_inode function, line where page fault occurs, marked. void audit_inode(const char *name, const struct inode *inode, unsigned flags) { int idx; struct audit_context *context = current->audit_context; if (!context->in_syscall) return; if (context->name_count && context->names[context->name_count-1].name && context->names[context->name_count-1].name == name) idx = context->name_count - 1; else if (context->name_count > 1 && context->names[context->name_count-2].name && context->names[context->name_count-2].name == name) idx = context->name_count - 2; else { /* FIXME: how much do we care about inodes that have no * associated name? */ if (context->name_count >= AUDIT_NAMES - AUDIT_NAMES_RESERVED) return; idx = context->name_count++; context->names[idx].name = NULL; #if AUDIT_DEBUG ++context->ino_count; #endif } context->names[idx].flags = flags; ===fault on the next line===> context->names[idx].ino = inode->i_ino; context->names[idx].dev = inode->i_sb->s_dev; context->names[idx].mode = inode->i_mode; context->names[idx].uid = inode->i_uid; context->names[idx].gid = inode->i_gid; context->names[idx].rdev = inode->i_rdev; }
How does it relate to ACPI battery? What inode do you evaluate and where do you get it from?
Point is, I am not 100% sure it is battery who is guilty; but usually after this, `cat /proc/acpi/battery/BAT0/state` gives some form of error: `cat` process stall, or got reading error, or reads empty file. But it may be that this is some general bug in procfs and mentioned above file is simply one that happens to suffer most in my environment. I will add printk's around audit_inode to see more info.
Another variant of ACPI problem: after resume, the following appears in syslog: Jan 31 20:34:38 vkondra-mobl kernel: ACPI-0292: *** Error: Looking up [SERN] in namespace, AE_ALREADY_EXISTS Jan 31 20:34:38 vkondra-mobl kernel: ACPI-0508: *** Error: Method execution failed [\_SB_.PCI0.LPC_.EC__.GBIF] (Node f7f02354), AE_ALREADY_EXISTS Jan 31 20:34:41 vkondra-mobl kernel: ACPI-0213: *** Error: Method reached ma ximum reentrancy limit (255) Jan 31 20:34:41 vkondra-mobl kernel: ACPI-0508: *** Error: Method execution failed [\_SB_.PCI0.LPC_.EC__.BAT0._BIF] (Node f7f0219c), AE_AML_METHOD_LIMIT Jan 31 20:34:41 vkondra-mobl kernel: done Jan 31 20:34:44 vkondra-mobl kernel: ACPI-0213: *** Error: Method reached ma ximum reentrancy limit (255) .........etc. etc. now, /proc/acpi/battery/BAT0/state is OK but /proc/acpi/battery/BAT0/state is not: [vkondra-l@vkondra-mobl ~]$ cat /proc/acpi/battery/BAT0/state present: yes capacity state: ok charging state: charging present rate: 16330 mW remaining capacity: 61970 mWh present voltage: 12325 mV [vkondra-l@vkondra-mobl ~]$ cat /proc/acpi/battery/BAT0/info present: yes ERROR: Unable to read battery information
I am also looking into a similar issue. I encountered it while running fsracer tests on SLES10 B1, which is 2.6.15-git12-6
I think the bad Inode gets in from do_lookup(). I suspect __d_lookup and real_lookup.
Hi Vladimir, Could you check out the following patch ? Its against 2.6.15-rc7. The path_lookup() doesn't check whether the link_path_walk() successful or not. --- linux-2.6.15-rc7-I/fs/namei.c 2006-01-03 10:25:57.000000000 -0800 +++ linux-2.6.15-rc7-I/fs/namei.c.mod 2006-02-06 05:08:36.000000000 -0800 @@ -1090,7 +1090,7 @@ int fastcall path_lookup(const char *nam current->total_link_count = 0; retval = link_path_walk(name, nd); out: - if (unlikely(current->audit_context + if (unlikely(!retval && current->audit_context && nd && nd->dentry && nd->dentry->d_inode)) audit_inode(name, nd->dentry->d_inode, flags); return retval; Thanks.
Applied. I use 2.6.16-rc2-git2 (latest for today). Patch as is fails, but idea is clear so it is easy to adjust. I run watch -n 0 cat /proc/acpi/battery/BAT0/[is]* and all applets that I know to monitor battery state; I did couple of suspend/resume cycles. So far, so good. But, to be sure I need to run it for some more time. Let me test it for, say, one week from now. I will report immediately in case it oops again.
FYI.. The following patch is now in mm tree. diff -puN fs/namei.c~fix-do_path_lookup-to-add-the-check-for-error-in-link_path_walk fs/namei.c --- devel/fs/namei.c~fix-do_path_lookup-to-add-the-check-for-error-in-link_path_walk 2006-02-06 22:25:12.000000000 -0800 +++ devel-akpm/fs/namei.c 2006-02-06 22:27:45.000000000 -0800 @@ -1119,9 +1119,11 @@ static int fastcall do_path_lookup(int d current->total_link_count = 0; retval = link_path_walk(name, nd); out: - if (unlikely(current->audit_context - && nd && nd->dentry && nd->dentry->d_inode)) + if (likely(retval == 0)) { + if (unlikely(current->audit_context && nd && nd->dentry && + nd->dentry->d_inode)) audit_inode(name, nd->dentry->d_inode, flags); + } return retval; fput_unlock_fail:
I would consider it fixed. During week of intensive testing I have no single failure.
Closing as per Comment #10.