2.6.33-rc3 in 64bits mode Distro: Bluewhite64 13 (64bits ports of Slackware 13) Acer Aspire 1511Lmi laptop At first suspend after a fresh reboot, I got the following warning: WARNING: kmemcheck: Caught 32-bit read from uninitialized memory (ffff88004cd08b1c) 00011000000000000002200000000000627574746f6e2f6c6964000000000000 i i i i i i i i i i i i i i i i i i i i i i i i i i i u u u u u ^ Pid: 3264, comm: hald Not tainted 2.6.33-rc3 #2 Aspire 1510/Aspire 1510 RIP: 0010:[<ffffffff812692c1>] [<ffffffff812692c1>] acpi_bus_receive_event+0xf5/0x119 RSP: 0018:ffff88004bfb5dd8 EFLAGS: 00010286 RAX: 0000000000000038 RBX: ffff88004cd08b00 RCX: 0000000000000007 RDX: 00000000ffffffed RSI: ffff88004cd08b1c RDI: ffff88004bfb5e64 RBP: ffff88004bfb5e28 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000282 R13: ffff88004bfb5e48 R14: ffff88004bfb5f48 R15: 0000000001da4420 FS: 00007f93fe93f6f0(0000) GS:ffffffff81a1c000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffff88004f810c20 CR3: 000000004bf4d000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400 [<ffffffff8126f96a>] acpi_system_read_event+0x6b/0x115 [<ffffffff810fd2b9>] proc_reg_read+0x79/0xb0 [<ffffffff810b1867>] vfs_read+0xc7/0x160 [<ffffffff810b19f0>] sys_read+0x50/0x90 [<ffffffff81002468>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff
Why do you think it's a regression?
does this happen during suspend, or during resume? is that the only time it is possible to provoke it?
I think it's a regression because I've never got this warning until 2.6.33-rc3. Indeed, I used to lose network connectivity while suspending, but I never got an uninitialized read in acpi before now. Maybe this is just occuring now however, so it may be an existing problem that didn't occurred for whatever reason before. The warning occurs before the system goes to sleep, when I close the lid. I think it's when the event that calls /etc/acpi/acpi_handler.sh that triggers this. I'm testing -rc4 to see if it is still there, and will give the faulty line then.
I've forgotten to say: I confirm I get this warning only by closing the lid. No other operation seem to produce it, however I don't do many thinks while testing kernels: kmemcheck is slow and I'm not too confident with it. I still have it with 2.6.33-rc4, the warning is the first thing that appears in dmesg when I close the lid, then it is followed by the suspension code: WARNING: kmemcheck: Caught 32-bit read from uninitialized memory (ffff88004bfd3b5c) 000110000000adde000220000000adde627574746f6e2f6c6964000000000000 i i i i i i i i i i i i i i i i i i i i i i i i i i i u u u u u ^ Pid: 3396, comm: acpid Not tainted 2.6.33-rc4 #3 Aspire 1510/Aspire 1510 RIP: 0010:[<ffffffff812694f5>] [<ffffffff812694f5>] acpi_bus_receive_event+0xf5/0x119 RSP: 0018:ffff88004cd79dd8 EFLAGS: 00010282 RAX: 0000000000000038 RBX: ffff88004bfd3b40 RCX: 0000000000000007 RDX: 00000000ffffffed RSI: ffff88004bfd3b5c RDI: ffff88004cd79e64 RBP: ffff88004cd79e28 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000282 R13: ffff88004cd79e48 R14: ffff88004cd79f48 R15: 00000000009e0420 FS: 00007fa9666316f0(0000) GS:ffffffff81a1c000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffff88004f810c48 CR3: 00000000419ce000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400 [<ffffffff8126fb9e>] acpi_system_read_event+0x6b/0x115 [<ffffffff810fd3f9>] proc_reg_read+0x79/0xb0 [<ffffffff810b1917>] vfs_read+0xc7/0x160 [<ffffffff810b1aa0>] sys_read+0x50/0x90 [<ffffffff81002468>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff eth1: deauthenticated from 00:07:cb:55:12:b0 (Reason: 7) PM: Syncing filesystems ... done. PM: Preparing system for mem sleep Freezing user space processes ... (elapsed 0.01 seconds) done. Freezing remaining freezable tasks ... (elapsed 0.01 seconds) done. PM: Entering mem sleep Suspending console(s) (use no_console_suspend to debug) The faulty line is: christian@athor:~$ addr2line -e /usr/src/linux-2.6.33/vmlinux -i ffffffff8126fb9e /usr/src/linux-2.6.33/drivers/acpi/event.c:64 That is: static ssize_t acpi_system_read_event(struct file *file, char __user * buffer, size_t count, loff_t * ppos) { int result = 0; struct acpi_bus_event event; static char str[ACPI_MAX_STRING]; static int chars_remaining = 0; static char *ptr; if (!chars_remaining) { memset(&event, 0, sizeof(struct acpi_bus_event)); if ((file->f_flags & O_NONBLOCK) && (list_empty(&acpi_bus_event_list))) return -EAGAIN; result = acpi_bus_receive_event(&event); ---> if (result) return result; chars_remaining = sprintf(str, "%s %s %08x %08x\n", event.device_class ? event. device_class : "<unknown>", event.bus_id ? event. bus_id : "<unknown>", event.type, event.data); One more time, I don't understand what can trigger this, since result is assigned just the preceding line. Maybe acpi_bus_receive_events is inlined?
The requested info has been provided.
now can you reproduce it by closing the lid in 2.6.32?
Indeed I managed to reproduce it by closing the lid in 2.6.32.4. Finally it doesn't seem to be a regression, even if 2.6.33-rc2 and -rc1 didn't trigger it (don't know why I don't get it). I remove the regression flag and adjust the kernel version on which it occurs to 2.6.32.4 (it may also occurs before that version, but I havent tested previous kernels).
I'm confused... (In reply to comment #7) > Indeed I managed to reproduce it by closing the lid in 2.6.32.4. 2.6.32.4 has this problem. > Finally it doesn't seem to be a regression, even if 2.6.33-rc2 and -rc1 > didn't > trigger it (don't know why I don't get it). > and the problem doesn't exist in 2.6.33-rc1 and rc1. > I remove the regression flag and adjust the kernel version on which it occurs > to 2.6.32.4 (it may also occurs before that version, but I havent tested > previous kernels). so the problem is fixed in 2.6.33-rc? it would be great if you can verify if the problem exists in older kernels releases.
I've not seen this problem with 2.6.33-rc1 and -rc2. However, that doesn't mean that these versions fix it, since I also got other problems with suspend and maybe the code path was different, or maybe I simply missed the warning by out of luck (after all, I also missed it with 2.6.32 until I retry it explicitly as said in comment #7, since I also checked 2.6.32 with kmemcheck at least up to 2.6.32-rc2). I've also tester 2.6.31, and it didn't trigger the warning (I got dozens of schedule while atomic in acpi code however that are fixed now, but I haven't got any read uninitialized). To sum up: 2.6.31: ok as far as this warning is concerned, didn't occurs 2.6.32.4 : nok (warning triggered) 2.6.33-rc1 and rc2: ok, but maybe I should recheck it 2.6.33-rc3 to -rc5: nok (warning triggered) So I think the problem is indeed there since somewhere between 2.6.31 and 2.6.32.4, and I simply missed it in 33-rc1 & 2 for another reason. Maybe it was introduced lately in 2.6.32 development cycle (I remove kmemcheck options as soon as the kernel begins to work correctly on my laptop to boot more quickly). I could bisect it if necessary, but I have only time for 1-2 kernel check per week on my spare time so it will take a long if I start this bisection.
So this looks like a regression from 2.6.31.
I've done 3 more checked: - 2.6.32-rc5 does have the warning. - 2.6.32-rc3 is unable to suspend correctly my laptop (it freeze). However, it does not warn when I close the lid but do not suspend (by commenting out my script /etc/acpi/hibernate.sh, that is called from /etc/acpi/acpi_handler.sh on the lid close event). - 2.6.32-rc4 does have the warning when I close the lid, even without suspending to RAM (that is, alos with /etc/acpi/hibernate.sh commented out). It seems I don't need to suspend to have the warning: only closing the lid is sufficient. I'll have tot recheck 2.6.32-rc3, to be sure it doesn't warn, since I feel the warning is not always triggered. But I'm quite confident the warning was introduced between rc3 and rc4. The curious thing is that I've not seen any acpi related commit between these two rc. I've also forgotten how to get a git repo to bisect so I won't bisect more tonight.
2.6.32-rc1 triggers the warning too. I missed it in rc3 apparently.
On Tuesday 26 January 2010, Christian Casteyde wrote: > It's also still present in 33-rc5.
I don't have any clues about this one. It would be great of you can run git-bisect to narrow down the problem.
It was quite long due to slow boots with kmemcheck, but I managed to bisect it. I strictly don't understand why this commit causes the bug, more related to /proc fs or vfs than to acpi apparently: eca6f534e61919b28fb21aafbd1c2983deae75be is first bad commit commit eca6f534e61919b28fb21aafbd1c2983deae75be Author: Vegard Nossum <vegard.nossum@gmail.com> Date: Fri Sep 18 13:05:45 2009 -0700 fs: fix overflow in sys_mount() for in-kernel calls sys_mount() reads/copies a whole page for its "type" parameter. When do_mount_root() passes a kernel address that points to an object which is smaller than a whole page, copy_mount_options() will happily go past this memory object, possibly dereferencing "wild" pointers that could be in any state (hence the kmemcheck warning, which shows that parts of the next page are not even allocated). (The likelihood of something going wrong here is pretty low -- first of all this only applies to kernel calls to sys_mount(), which are mostly found in the boot code. Secondly, I guess if the page was not mapped, exact_copy_from_user() _would_ in fact handle it correctly because of its access_ok(), etc. checks.) But it is much nicer to avoid the dubious reads altogether, by stopping as soon as we find a NUL byte. Is there a good reason why we can't do something like this, using the already existing strndup_from_user()? [akpm@linux-foundation.org: make copy_mount_string() static] [AV: fix compat mount breakage, which involves undoing akpm's change above] Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: al <al@dizzy.pdmi.ras.ru> :040000 040000 a89bfba8d50e5dfc6e77696d93cd772cd0ffacf3 9a3066b7678c117ea08efaf15fe558a63ccbda61 M fs I append the git bisect log.
Created attachment 24815 [details] git bisect log
First-Bad-Commit : eca6f534e61919b28fb21aafbd1c2983deae75be
On Monday 01 February 2010, Christian Casteyde wrote: > Yes, still applies to 2.6.32-rc6.
On Monday 08 February 2010, Christian Casteyde wrote: > Still present in 2.6.33-rc7. > > Bonjour, > Le lundi 8 février 2010, vous avez écrit : > > This message has been generated automatically as a part of a report > > of regressions introduced between 2.6.31 and 2.6.32. > > > > The following bug entry is on the current list of known regressions > > introduced between 2.6.31 and 2.6.32. Please verify if it still should > > be listed and let the tracking team know (either way). > > > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14998 > > Subject : Caught 32-bit read from uninitialized memory in > > acpi_system_read_event -- 2.6.31 regression Submitter : Christian > Casteyde > > <casteyde.christian@free.fr> > > Date : 2010-01-06 21:40 (33 days old)
On Saturday 27 February 2010, Christian Casteyde wrote: > Still present in 2.6.33.
Update : still present in 2.6.34-rc1.
Update : still present in 2.6.34-rc2.
It's hard for me to see how eca6f534e61919 affects anything. What about if (just for laughs) we change the sprintf() to snprintf()? I've attached a patch. After that could you post a full dmesg and an lspci? (I'm getting kicked out of my internet cafe. :P)
Created attachment 25775 [details] change sprintf to snprintf changes sprintf() to snprintf().
Changing to snprintf didn't changed anything on 2.6.34-rc3. I'm appending the full dmesg log of 34-rc3, and lspci from 2.6.33 (I forgot to do it on -34).
Created attachment 25783 [details] dmesg output for 2.6.34 with snprintf patch and suspend/resume
Created attachment 25784 [details] lspci -v for 2.6.33
My sprintf() patch isn't right. snprintf() returns the same number that sprintf() would have. But I think the bug might be one of these: https://patchwork.kernel.org/patch/86927/ http://lkml.org/lkml/2010/3/25/217 Those could easily cause a bug at the result = acpi_bus_receive_event(&event);
Update: I haven't checked those patches yet, however this bug is still present in 2.6.34-rc4.
Well, those patches doesn't fix this bug. I still have the warning of reading uninitialized memory with them.
If we take the last bit of initialized data: 000110000000adde000220000000adde627574746f6e2f6c6964000000000000 i i i i i i i i i i i i i i i i i i i i i i i i i i i u u u u u unsigned short a[] = {0x6275, 0x7474, 0x6f6e, 0x2f6c, 0x6964, 0x0000}; printf("%s\n", a); That prints "ubttnol/di" or "button/lid" after we swap the characters. So basically there are some uninitialized bytes after the NULL terminator. This looks like a false positive. BTW we don't leak the uninitialized bytes to the user do we? I'll post a patch that probably will get rid of the message.
Created attachment 26055 [details] mark allocation as false positive This marks the allocation as a false positive so that kmemcheck doesn't complain.
Tested: With 2.6.34-rc5 + patch from comment #32, the warning doesn't occur anymore.
(In reply to comment #4) > I've forgotten to say: I confirm I get this warning only by closing the lid. > No > other operation seem to produce it, however I don't do many thinks while > testing kernels: kmemcheck is slow and I'm not too confident with it. > > I still have it with 2.6.33-rc4, the warning is the first thing that appears > in > dmesg when I close the lid, then it is followed by the suspension code: > > > WARNING: kmemcheck: Caught 32-bit read from uninitialized memory > (ffff88004bfd3b5c) > 000110000000adde000220000000adde627574746f6e2f6c6964000000000000 > i i i i i i i i i i i i i i i i i i i i i i i i i i i u u u u u > ^ > > Pid: 3396, comm: acpid Not tainted 2.6.33-rc4 #3 Aspire 1510/Aspire 1510 > RIP: 0010:[<ffffffff812694f5>] [<ffffffff812694f5>] > acpi_bus_receive_event+0xf5/0x119 > RSP: 0018:ffff88004cd79dd8 EFLAGS: 00010282 > RAX: 0000000000000038 RBX: ffff88004bfd3b40 RCX: 0000000000000007 > RDX: 00000000ffffffed RSI: ffff88004bfd3b5c RDI: ffff88004cd79e64 > RBP: ffff88004cd79e28 R08: 0000000000000000 R09: 0000000000000001 > R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000282 > R13: ffff88004cd79e48 R14: ffff88004cd79f48 R15: 00000000009e0420 > FS: 00007fa9666316f0(0000) GS:ffffffff81a1c000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: ffff88004f810c48 CR3: 00000000419ce000 CR4: 00000000000006f0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400 > [<ffffffff8126fb9e>] acpi_system_read_event+0x6b/0x115 > [<ffffffff810fd3f9>] proc_reg_read+0x79/0xb0 > [<ffffffff810b1917>] vfs_read+0xc7/0x160 > [<ffffffff810b1aa0>] sys_read+0x50/0x90 > [<ffffffff81002468>] system_call_fastpath+0x16/0x1b > [<ffffffffffffffff>] 0xffffffffffffffff > eth1: deauthenticated from 00:07:cb:55:12:b0 (Reason: 7) > PM: Syncing filesystems ... done. > PM: Preparing system for mem sleep > Freezing user space processes ... (elapsed 0.01 seconds) done. > Freezing remaining freezable tasks ... (elapsed 0.01 seconds) done. > PM: Entering mem sleep > Suspending console(s) (use no_console_suspend to debug) > > The faulty line is: > christian@athor:~$ addr2line -e /usr/src/linux-2.6.33/vmlinux -i > ffffffff8126fb9e > /usr/src/linux-2.6.33/drivers/acpi/event.c:64 > > That is: > > static ssize_t > acpi_system_read_event(struct file *file, char __user * buffer, size_t count, > loff_t * ppos) > { > int result = 0; > struct acpi_bus_event event; > static char str[ACPI_MAX_STRING]; > static int chars_remaining = 0; > static char *ptr; > > if (!chars_remaining) { > memset(&event, 0, sizeof(struct acpi_bus_event)); > > if ((file->f_flags & O_NONBLOCK) > && (list_empty(&acpi_bus_event_list))) > return -EAGAIN; > > result = acpi_bus_receive_event(&event); > ---> if (result) > return result; > > chars_remaining = sprintf(str, "%s %s %08x %08x\n", > event.device_class ? event. > device_class : "<unknown>", > event.bus_id ? event. > bus_id : "<unknown>", event.type, > event.data); > > One more time, I don't understand what can trigger this, since result is > assigned just the preceding line. Maybe acpi_bus_receive_events is inlined? I can explain why. You ran addr2line on the address of the first stack entry: > [<ffffffff8126fb9e>] acpi_system_read_event+0x6b/0x115 > [<ffffffff810fd3f9>] proc_reg_read+0x79/0xb0 > christian@athor:~$ addr2line -e /usr/src/linux-2.6.33/vmlinux -i > ffffffff8126fb9e > /usr/src/linux-2.6.33/drivers/acpi/event.c:64 It would have been more appropriate to run addr2line on the RIP value, i.e. ffffffff812694f5: > RIP: 0010:[<ffffffff812694f5>] [<ffffffff812694f5>] Additionally, since the stack entries are essentially return addresses, they generally refer to the instruction just after the CALL instruction, not the CALL itself.
(By the way, I forgot to say: Thanks for the testing effort and the report!) The actual problem here is that acpi_bus_receive_event() does a full memcpy between two structures: int acpi_bus_receive_event(struct acpi_bus_event *event) { ... memcpy(event, entry, sizeof(struct acpi_bus_event)); ... } Clearly, since kmemcheck complains about a read, at least the "entry" pointer is tracked by kmemcheck and partially uninitialized. What about the "event" pointer? It comes from the caller -- acpi_system_read_event(): acpi_system_read_event(struct file *file, char __user * buffer, size_t count, loff_t * ppos) { ... struct acpi_bus_event event; ... result = acpi_bus_receive_event(&event); ... } So the actual problem is that "event" was allocated on the stack and is not tracked by kmemcheck, so copying something uninitialized into it will trigger the report. I am in two minds about the proposed fix. The currently posted patch is obviously the less intrusive one in terms of both correctness and churn, but here are some alternative suggestions: 1. Copy the struct "by hand" and use strncpy() for the strings to avoid the uninitialized areas. 2. Allocate the event with kmalloc() instead of on the stack and free it after the call to acpi_bus_receive_event(). 3. Allocate the event with kmalloc() and pass ownership of the structure to acpi_bus_receive_event() to avoid the copying altogether. Vegard
Just to confirm, I redit an addr2kine with rip as you mentioned: it is indeed in memcpy that the problem occurs: /usr/src/linux-2.6.33/arch/x86/include/asm/string_64.h:12 /usr/src/linux/drivers/acpi/bus.c:598 spin_lock_irqsave(&acpi_bus_event_lock, flags); if (!list_empty(&acpi_bus_event_list)) { entry = list_entry(acpi_bus_event_list.next, struct acpi_bus_event, node); list_del(&entry->node); } spin_unlock_irqrestore(&acpi_bus_event_lock, flags); if (!entry) return -ENODEV; here ---> memcpy(event, entry, sizeof(struct acpi_bus_event)); kfree(entry); return 0; }
https://patchwork.kernel.org/patch/95222/ applied to acpi tree
Update: Still present in 2.6.34-rc6.
Len Brown picked up the kmalloc() => kzalloc() patch but it takes a while to filter into Linus's tree. I don't know what Len's plan is, but there is even a possibility that the patch might get delayed until 2.6.35. We're pretty far into the -rc cycle for 2.6.34 and we're only supposed to be merging crashing fixes now. This isn't a crashing bug, it's just a warning and the kmemcheck option is mostly for debugging only. Then again, I think Len has some crashing bugs still left to merge so this might get included with those. Or maybe he'll let it sit in -next for a week and see if anyone complains. I don't know. I don't have to make any decisions fortunately. For now just try ignore the message. :)
shipped in 2.6.34 closed.