5565 – Guess of i386 APIC PTE area scribble

Bug 5565 - Guess of i386 APIC PTE area scribble

Summary: Guess of i386 APIC PTE area scribble

Status:	REJECTED UNREPRODUCIBLE

Alias:	None

Product:	Platform Specific/Hardware
Classification:	Unclassified
Component:	i386 (show other bugs)
Hardware:	i386 Linux

Importance:	P2 high
Assignee:	Zwane Mwaikambo

URL:
Keywords:

Depends on:
Blocks:

Reported:	2005-11-07 14:27 UTC by Andrew J. Kroll
Modified:	2008-09-22 15:59 UTC (History)
CC List:	5 users (show)

See Also:
Kernel Version:	2.6.13.4
Subsystem:
Regression:	---
Bisected commit-id:

Attachments
do_page_fault debug patch (813 bytes, patch) 2005-11-14 00:15 UTC, Zwane Mwaikambo	Details \| Diff
patch to the patch (1.19 KB, text/plain) 2005-11-14 04:31 UTC, Andrew J. Kroll	Details
Lights, Camera, Action... Death (158 bytes, text/html) 2005-11-14 05:19 UTC, Andrew J. Kroll	Details
Additional panic information capture (158 bytes, text/html) 2005-11-14 20:45 UTC, Andrew J. Kroll	Details
More debug info (91.09 KB, text/plain) 2005-11-15 22:08 UTC, Andrew J. Kroll	Details
2.2.17 working IDE + other stuff patch (688.77 KB, application/octet-stream) 2006-01-29 05:09 UTC, Andrew J. Kroll	Details
Add an attachment (proposed patch, testcase, etc.)

Description Andrew J. Kroll 2005-11-07 14:27:58 UTC

Most recent kernel where this bug did not occur: 2.2.13
Distribution: Any
Hardware Environment: 2x I686 (P III Coppermine) @ 650Mhz 100MHZ FSB, Twelve IDE
HDD on Three pdc20262, SB16PCI (AKA Ensoniq 5880), Realtek RTL-8196, MGA G200,
[3C509B, EMU10K, com90C66 Arcnet, and Lava LP card on ISA], 419856K PC133 ram.
Software Environment: Debian Sarge, with kernel.org kernel
Problem Description: IO-APIC IRQ of DEATH on very high IRQ loads, which seem to
scribble on the PTE area

Steps to reproduce: from bash
for i in a b c d e f g h i j k l ; { dd if=/dev/hd$i of=/dev/null & }

This simulates a normal load when I do a parallel backup.

photo references:
http://dr.ea.ms/~oldfart/panics/P1010004.JPG (stock kernel)
http://dr.ea.ms/~oldfart/panics/P1010009.JPG (stock kernel with 2 bogus PTE entries)

Another report in the wild suggest this can happen on other combinations of
hardware when disk and networking is excercized heavily.

Comment 1 Andrew J. Kroll 2005-11-12 19:08:54 UTC

vmlinux binary, .config file, lspci, etc
http://dr.ea.ms/~oldfart/panics/diagdata.tgz

source tree including the pte area fudge (CAUTION LARGE FILE!!! 71.0M!!!)
http://dr.ea.ms/~oldfart/panics/linux-2.6.13.4-panic.tgz

Hope these resources can assist in finding the problem.

Comment 2 Zwane Mwaikambo 2005-11-14 00:15:05 UTC

Created attachment 6575 [details]
do_page_fault debug patch

Could you please reproduce the bug with the attached patch?

Comment 3 Andrew J. Kroll 2005-11-14 03:57:25 UTC

patch gave an undefined reference to read_cr3 ....

Comment 4 Andrew J. Kroll 2005-11-14 04:23:51 UTC

compiled after I copied the routine from the kernel exec source.

Comment 5 Andrew J. Kroll 2005-11-14 04:31:57 UTC

Created attachment 6577 [details]
patch to the patch

Comment 6 Andrew J. Kroll 2005-11-14 05:19:57 UTC

Created attachment 6578 [details]
Lights, Camera, Action... Death

Comment 7 Andrew J. Kroll 2005-11-14 20:45:07 UTC

Created attachment 6589 [details]
Additional panic information capture

Added an mdelay(1) in the pdc*old.c dma routine so I could capture the
remainder of the printk's before the machine would totally go dark.

Comment 8 Andrew J. Kroll 2005-11-15 22:08:10 UTC

Created attachment 6594 [details]
More debug info

After attaching serial console, and doing a clean and remake, etc, the fault
address moved... however it is pretty much consistant. This is as interesting
as annoying, and I am beginning to suspect the problem is some sort of race
condition during a high load where the DMA gets told the wrong address.....

Comment 9 Andrew J. Kroll 2006-01-29 05:09:58 UTC

Created attachment 7171 [details]
2.2.17 working IDE + other stuff patch

Here is the entire tree patch I use... it includes all sorts of misc bugfixes
as well that are not related to the IDE issue, but are nice to have and perhaps
could be a factor, thus I am tossing up the entire set. Included also is the
config settings that I use, so that one may be able to replicate the kernel
code with 
gcc version 2.7.2.3 and libc5

Comment 10 Adrian Bunk 2006-11-13 07:53:11 UTC

Is this bug still present in kernel 2.6.18?

Comment 11 Andrew J. Kroll 2006-11-27 23:49:57 UTC

I still need to check if it does, I shall check it in a few days. My guess is
that it probabbly still is.

Comment 12 Andrew J. Kroll 2007-01-31 18:12:00 UTC

Finally did the test... I added more options for tracing and whatnot... got an 
SMP lock wedged on cpu#0 then, after some time, the debug dump. I apologize 
for the poor photos, but they are the same error.

http://dr.ea.ms/~oldfart/panics/002.bmp
http://dr.ea.ms/~oldfart/panics/005.bmp

Comment 13 Andrew J. Kroll 2007-01-31 18:18:15 UTC

oh yeah, and it was tested on 2.6.19.2, for reference ;-) sorry, forgot.

Comment 14 Zwane Mwaikambo 2007-01-31 23:14:25 UTC

It looks like it still may be trapping on the APIC access. Can you get a picture
of the main oops dump?

Comment 15 Natalie Protasevich 2007-07-03 18:11:08 UTC

Andrew, can you please take a serial capture with the latest kernel.
Zwane, is that what would be sufficient? and as I understand you needed opos with unmodified kernel.

Comment 16 Anonymous Emailer 2007-07-16 14:26:59 UTC

Reply-To: akpm@linux-foundation.org

test to bugme-daemon@kernel-bugs.osdl.org, please ignore.

Comment 17 Anonymous Emailer 2007-07-16 14:27:44 UTC

Reply-To: akpm@linux-foundation.org

test to bugme-daemon@bugzilla.kernel.org, please ignore

Comment 18 Andrew Morton 2007-08-02 15:44:59 UTC

This is looking like a dead bug.  Let's shut it down if
nobody can reproduce it in 2.6.22.

Comment 19 Andrew J. Kroll 2007-08-06 16:07:29 UTC

I guess I will have to test again... If it still fails, then I will assume it is a motherboard issue.

Comment 20 Natalie Protasevich 2008-03-05 22:20:01 UTC

Andrew, any news? Did you get chance to try latest kernel?

Note You need to log in before you can comment on or make changes to this bug.