Bug 5565
Summary: | Guess of i386 APIC PTE area scribble | ||
---|---|---|---|
Product: | Platform Specific/Hardware | Reporter: | Andrew J. Kroll (a) |
Component: | i386 | Assignee: | Zwane Mwaikambo (zwane) |
Status: | REJECTED UNREPRODUCIBLE | ||
Severity: | high | CC: | akpm, bunk, bzolnier, mingo, protasnb |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.13.4 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: |
do_page_fault debug patch
patch to the patch Lights, Camera, Action... Death Additional panic information capture More debug info 2.2.17 working IDE + other stuff patch |
Description
Andrew J. Kroll
2005-11-07 14:27:58 UTC
vmlinux binary, .config file, lspci, etc http://dr.ea.ms/~oldfart/panics/diagdata.tgz source tree including the pte area fudge (CAUTION LARGE FILE!!! 71.0M!!!) http://dr.ea.ms/~oldfart/panics/linux-2.6.13.4-panic.tgz Hope these resources can assist in finding the problem. Created attachment 6575 [details]
do_page_fault debug patch
Could you please reproduce the bug with the attached patch?
patch gave an undefined reference to read_cr3 .... compiled after I copied the routine from the kernel exec source. Created attachment 6577 [details]
patch to the patch
Created attachment 6578 [details]
Lights, Camera, Action... Death
Created attachment 6589 [details]
Additional panic information capture
Added an mdelay(1) in the pdc*old.c dma routine so I could capture the
remainder of the printk's before the machine would totally go dark.
Created attachment 6594 [details]
More debug info
After attaching serial console, and doing a clean and remake, etc, the fault
address moved... however it is pretty much consistant. This is as interesting
as annoying, and I am beginning to suspect the problem is some sort of race
condition during a high load where the DMA gets told the wrong address.....
Created attachment 7171 [details]
2.2.17 working IDE + other stuff patch
Here is the entire tree patch I use... it includes all sorts of misc bugfixes
as well that are not related to the IDE issue, but are nice to have and perhaps
could be a factor, thus I am tossing up the entire set. Included also is the
config settings that I use, so that one may be able to replicate the kernel
code with
gcc version 2.7.2.3 and libc5
Is this bug still present in kernel 2.6.18? I still need to check if it does, I shall check it in a few days. My guess is that it probabbly still is. Finally did the test... I added more options for tracing and whatnot... got an SMP lock wedged on cpu#0 then, after some time, the debug dump. I apologize for the poor photos, but they are the same error. http://dr.ea.ms/~oldfart/panics/002.bmp http://dr.ea.ms/~oldfart/panics/005.bmp oh yeah, and it was tested on 2.6.19.2, for reference ;-) sorry, forgot. It looks like it still may be trapping on the APIC access. Can you get a picture of the main oops dump? Andrew, can you please take a serial capture with the latest kernel. Zwane, is that what would be sufficient? and as I understand you needed opos with unmodified kernel. Reply-To: akpm@linux-foundation.org test to bugme-daemon@kernel-bugs.osdl.org, please ignore. Reply-To: akpm@linux-foundation.org test to bugme-daemon@bugzilla.kernel.org, please ignore This is looking like a dead bug. Let's shut it down if nobody can reproduce it in 2.6.22. I guess I will have to test again... If it still fails, then I will assume it is a motherboard issue. Andrew, any news? Did you get chance to try latest kernel? |