Bug 973
Summary: | Presario laptop panic during boot | ||
---|---|---|---|
Product: | Other | Reporter: | Thomas Molina (tmolina) |
Component: | Other | Assignee: | Zwane Mwaikambo (zwane) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | ||
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.0-test1 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: |
working 2.6.0-test3-mm2 dmesg output
configuration used to build 2.6.0-test3-mm2 serio race fix System map built with 2.6.0-test7 from bitkeeper as of 9 Oct handle unaligned stack rediffed handle unaligned stack patch |
Description
Thomas Molina
2003-07-20 18:07:24 UTC
Could you try and collect the oops in it's entirety please. Thanks. I have an update to this. I went back and tested various kernel revisions. The above panic started happening in 2.5.74-bk1. This appears to be where the store_stackinfo function was added, protected by CONFIG_DEBUG_PAGEALLOC. Sure enough, building a 2.6.0-test1 with the same configuration minus page allocation debugging produces a kernel which boots without the above panic. However, now I get the following oops, which at least makes it into the message log. This oops happens during bootup, followed a minute or two later by a followon oops which I am unable to capture. The oops does not happen with every boot unfortunately. Jul 25 04:54:24 lap last message repeated 2 times Jul 25 04:54:24 lap kernel: Unable to query/initialize Synaptics hardware. Jul 25 04:54:24 lap kernel: input: PS/2 Synaptics TouchPad on isa0060/serio1 Jul 25 04:54:24 lap kernel: slab error in cache_free_debugcheck(): cache `size-32': double free, or memory before object was overwritten Jul 25 04:54:24 lap kernel: Call Trace: Jul 25 04:54:24 lap kernel: [<c014f130>] kfree+0xf0/0x310 Jul 25 04:54:24 lap kernel: [<c02540fc>] psmouse_disconnect+0x2c/0x40 Jul 25 04:54:24 lap kernel: [<c02540fc>] psmouse_disconnect+0x2c/0x40 Jul 25 04:54:24 lap kernel: [<c025560d>] serio_handle_events+0xad/0xc0 Jul 25 04:54:24 lap kernel: [<c0255620>] serio_thread+0x0/0x100 Jul 25 04:54:24 lap kernel: [<c0255665>] serio_thread+0x45/0x100 Jul 25 04:54:24 lap kernel: [<c010a126>] work_resched+0x5/0x16 Jul 25 04:54:24 lap kernel: [<c0255620>] serio_thread+0x0/0x100 Jul 25 04:54:24 lap kernel: [<c0255620>] serio_thread+0x0/0x100 Jul 25 04:54:24 lap kernel: [<c01073b9>] kernel_thread_helper+0x5/0xc Jul 25 04:54:24 lap kernel: Jul 25 04:54:24 lap kernel: slab error in cache_free_debugcheck(): cache `size-32': double free, or memory after object was overwritten Jul 25 04:54:24 lap kernel: Call Trace: Jul 25 04:54:24 lap kernel: [<c014f15e>] kfree+0x11e/0x310 Jul 25 04:54:24 lap kernel: [<c02540fc>] psmouse_disconnect+0x2c/0x40 Jul 25 04:54:24 lap kernel: [<c02540fc>] psmouse_disconnect+0x2c/0x40 Jul 25 04:54:24 lap kernel: [<c025560d>] serio_handle_events+0xad/0xc0 Jul 25 04:54:24 lap kernel: [<c0255620>] serio_thread+0x0/0x100 Jul 25 04:54:24 lap kernel: [<c0255665>] serio_thread+0x45/0x100 Jul 25 04:54:24 lap kernel: [<c010a126>] work_resched+0x5/0x16 Jul 25 04:54:24 lap kernel: [<c0255620>] serio_thread+0x0/0x100 Jul 25 04:54:24 lap kernel: [<c0255620>] serio_thread+0x0/0x100 Jul 25 04:54:24 lap kernel: [<c01073b9>] kernel_thread_helper+0x5/0xc Jul 25 04:54:24 lap kernel: Jul 25 04:54:24 lap kernel: synaptics reset failed Jul 25 04:54:24 lap last message repeated 2 times Jul 25 04:54:24 lap kernel: Synaptics Touchpad, model: 1 Jul 25 04:54:24 lap kernel: Firware: 4.6 Jul 25 04:54:24 lap kernel: Sensor: 15 Jul 25 04:54:24 lap kernel: new absolute packet format Jul 25 04:54:24 lap kernel: Touchpad has extended capability bits Jul 25 04:54:24 lap kernel: -> four buttons Jul 25 04:54:24 lap kernel: -> multifinger detection Jul 25 04:54:24 lap kernel: -> palm detection Jul 25 04:54:24 lap kernel: input: Synaptics Synaptics TouchPad on isa0060/serio1 Jul 25 04:54:24 lap kernel: Slab corruption: start=c11c3d34, expend=c11c3d53, problemat=c11c3d34 Jul 25 04:54:24 lap kernel: Last user: [<c010c4ae>](request_irq+0x5e/0xd0) Jul 25 04:54:24 lap kernel: Data: 10 63 25 C0 00 00 00 04 00 00 00 00 89 8D 2D C0 20 A4 38 C0 00 00 00 00 .......A5 Jul 25 04:54:24 lap kernel: Next: A5 C2 0F 17 AE C4 10 C0 A5 C2 0F 17 00 00 00 00 00 00 00 00 0D F0 AD BA 00 00 00 00 .... ul 25 04:54:25 lap kernel: slab error in check_poison_obj(): cache `size-32': object was modified after freeing Jul 25 04:54:25 lap kernel: Call Trace: Jul 25 04:54:25 lap kernel: [<c014cb7c>] check_poison_obj+0x17c/0x1d0 Jul 25 04:54:25 lap kernel: [<c014ed39>] __kmalloc+0x169/0x1d0 Jul 25 04:54:25 lap kernel: [<c014f62c>] do_tune_cpucache+0x22c/0x4c0 Jul 25 04:54:25 lap kernel: [<c014f62c>] do_tune_cpucache+0x22c/0x4c0 Jul 25 04:54:25 lap kernel: [<c014f92f>] enable_cpucache+0x6f/0xa0 Jul 25 04:54:25 lap kernel: [<c014d2ff>] kmem_cache_create+0x56f/0x640 Jul 25 04:54:25 lap kernel: [<c0366f54>] ip_rt_init+0x64/0x3f0 Jul 25 04:54:25 lap kernel: [<c026a2ef>] neigh_sysctl_register+0x14f/0x1b0 Jul 25 04:54:25 lap kernel: [<c03673a7>] ip_init+0x17/0x20 Jul 25 04:54:25 lap kernel: [<c0367f26>] inet_init+0x176/0x210 Jul 25 04:54:25 lap kernel: [<c03547ab>] do_initcalls+0x2b/0xa0 Jul 25 04:54:25 lap kernel: [<c0137432>] init_workqueues+0x12/0x30 Jul 25 04:54:25 lap kernel: [<c0105068>] init+0x28/0x150 Jul 25 04:54:25 lap kernel: [<c0105040>] init+0x0/0x150 Jul 25 04:54:25 lap kernel: [<c01073b9>] kernel_thread_helper+0x5/0xc Jul 25 04:54:25 lap kernel: Jul 25 04:54:25 lap kernel: slab error in cache_alloc_debugcheck_after(): cache `size-32': memory before object was overwritten Jul 25 04:54:25 lap kernel: Call Trace: Jul 25 04:54:25 lap kernel: [<c014ecc3>] __kmalloc+0xf3/0x1d0 Jul 25 04:54:25 lap kernel: [<c014f62c>] do_tune_cpucache+0x22c/0x4c0 Jul 25 04:54:25 lap kernel: [<c014f62c>] do_tune_cpucache+0x22c/0x4c0 Jul 25 04:54:25 lap kernel: [<c014f92f>] enable_cpucache+0x6f/0xa0 Jul 25 04:54:25 lap kernel: [<c014d2ff>] kmem_cache_create+0x56f/0x640 Jul 25 04:54:25 lap kernel: [<c0366f54>] ip_rt_init+0x64/0x3f0 Jul 25 04:54:25 lap kernel: [<c026a2ef>] neigh_sysctl_register+0x14f/0x1b0 Jul 25 04:54:25 lap kernel: [<c03673a7>] ip_init+0x17/0x20 Jul 25 04:54:25 lap kernel: [<c0367f26>] inet_init+0x176/0x210 Jul 25 04:54:25 lap kernel: [<c03547ab>] do_initcalls+0x2b/0xa0 Jul 25 04:54:25 lap kernel: [<c0137432>] init_workqueues+0x12/0x30 Jul 25 04:54:25 lap kernel: [<c0105068>] init+0x28/0x150 Jul 25 04:54:25 lap kernel: [<c0105040>] init+0x0/0x150 Jul 25 04:54:25 lap kernel: [<c01073b9>] kernel_thread_helper+0x5/0xc Jul 25 04:54:25 lap kernel: Jul 25 04:54:25 lap kernel: Jul 25 04:54:25 lap kernel: slab error in cache_alloc_debugcheck_after(): cache `size-32': memory after object was overwritten Jul 25 04:54:25 lap kernel: Call Trace: Jul 25 04:54:25 lap kernel: [<c014eceb>] __kmalloc+0x11b/0x1d0 Jul 25 04:54:25 lap kernel: [<c014f62c>] do_tune_cpucache+0x22c/0x4c0 Jul 25 04:54:25 lap kernel: [<c014f62c>] do_tune_cpucache+0x22c/0x4c0 Jul 25 04:54:25 lap kernel: [<c014f92f>] enable_cpucache+0x6f/0xa0 Jul 25 04:54:25 lap kernel: [<c014d2ff>] kmem_cache_create+0x56f/0x640 Jul 25 04:54:25 lap kernel: [<c0366f54>] ip_rt_init+0x64/0x3f0 Jul 25 04:54:25 lap /sbin/hotplug: no runnable /etc/hotplug/pcmcia_socket.agent is installed Jul 25 04:54:25 lap kernel: [<c026a2ef>] neigh_sysctl_register+0x14f/0x1b0 Jul 25 04:54:25 lap pcmcia: Starting PCMCIA services: Jul 25 04:54:25 lap kernel: [<c03673a7>] ip_init+0x17/0x20 Jul 25 04:54:25 lap kernel: [<c0367f26>] inet_init+0x176/0x210 Jul 25 04:54:25 lap kernel: [<c03547ab>] do_initcalls+0x2b/0xa0 Jul 25 04:54:25 lap kernel: [<c0137432>] init_workqueues+0x12/0x30 Jul 25 04:54:25 lap kernel: [<c0105068>] init+0x28/0x150 Jul 25 04:54:25 lap kernel: [<c0105040>] init+0x0/0x150 Jul 25 04:54:25 lap kernel: [<c01073b9>] kernel_thread_helper+0x5/0xc Jul 25 04:54:25 lap kernel: Jul 25 04:54:25 lap kernel: IP: routing cache hash table of 128 buckets, 4Kbytes Another update: This behaviour continues in 2.6.0-test2 Could you please try test2-mm3 and if that doesn't work, you can apply the following patch. I have restested with 2.6.0-test2 as well as the mm5 patch from Andrew Morton and I am still seeing the same panic. I tried the mm5 patchset both with and without Synaptics support and the panic does not go away. Behaviour continues in 2.6.0-test3 Please send the /var/log/dmesg from 2.6.0-test3-mm1 _without_ synaptics mouse support (it's creating noise). Created attachment 664 [details]
working 2.6.0-test3-mm2 dmesg output
Created attachment 665 [details]
configuration used to build 2.6.0-test3-mm2
defining CONFIG_DEBUG_PAGEALLOC in this configuration produces and oops in
store_stackinfo. The stock linus kernel produces an oops in the synaptics code
on boot, but the oops is not perfectly reproducable.
The reported oops continues to happen with 2.6.0-test4. With CONFIG_DEBUG_PAGEALLOC defined I get the same oops on each test. Without the define, the system boots and appears to operate without further problems. Since my laptop doesn't have a serial port, I am unable to provide a complete oops capture. I have tried a USB to serial converter, but the support isn't available early enough to support a serail console. After looking at a number of reboots I am not sure it would help since the only line in the original call trace not reflected below is one reading: common_interrupt+0x18/0x20 This line appears following the line reading: do_IRQ+0x117/0x350 Beyond that, there are an endless string of "unable to handle paging request" type oops, apparently caused by the original panic. The only way I can look at what is on the screen before it scrolls off is hold the power button down. That gives me about five seconds to see what is on the screen before it reboots. Created attachment 709 [details]
serio race fix
There have been a few developments on LKML which might possibly affect you, we're currently waiting on the input subsystem maintainer to follow up on the patches posted. If you're feeling adventurous i've attached one of the patches. Hang in there i think we may get it fixed soon. I tried the serio race fix on top of bk-current and still get the same oops in store_stackinfo. I will do some more testing tomorrow with test4-mm1 since akpm put out a new patchset. Stock linux kernels through 2.6.0-test4-bk latest continue to panic in store_stackinfo. However, 2.6.0-test4-mm4 does not panic when CONFIG_DEBUG_PAGEALLC is defined as the others do. -mm1, and -mm2 also panic. I did not get a chance to test -mm3, but I could if it is deemed important. Further testing reveals that the minimum kernel hacking options required to produce the reported panic is memory allocation debugging (CONFIG_DEBUG_SLAB) and page allocation debugging (CONFIG_DEBUG_PAGEALLOC). This testing also reveals that even -mmX kernels are subject to this panic. Retesting previously tested kernels reveals they are also consistent with the above. In particular, enabling every kernel debugging option EXCEPT memory allocation debugging produces a working kernel. I installed RH 9 on my laptop today and recompiled the latest test4-bk pulled off the site. I got the expected kernel panic. I continue to test nightly builds with the latest bk changesets and see no change in behaviour. 2.6.0-test5-mm1 was also tested with the same results. I continue to get the same kernel panic with 2.6.0-test5 and 2.6.0-test5-mm2. Ok please build a new kernel with CONFIG_DEBUG_INFO also enabled, capture the output during boot (as much as you can) and attach the vmlinux and System.map Let's use 2.6.0-test7 Created attachment 1028 [details]
System map built with 2.6.0-test7 from bitkeeper as of 9 Oct
Created attachment 1030 [details]
handle unaligned stack
Some APM BIOS versions push 16-bit values to the kernel stack. The resulting
stack is not 32-bit aligned, and that breaks the end of stack detection in
show_stack and store_stackinfo.
The patch fixes.
Created attachment 1034 [details]
rediffed handle unaligned stack patch
Previous patch had the kstack_end patch in an incorrect location. This
rediffed patch moves it to the correct location.
It is confirmed the unaligned stack patch fixes the reported problem. The patch has been incorporated into bitkeeper. This bug is resolved by code fix and will be marked closed when Linus releases the next 2.6.0 test version. The submitted patch is confirmed to fix the reported bug. This patch has been incorporated into Linus' kernel tree. This closes this bug. |