Bug 3639 - NForce3 problem with IOAPIC
Summary: NForce3 problem with IOAPIC
Status: REJECTED WILL_NOT_FIX
Alias: None
Product: ACPI
Classification: Unclassified
Component: Config-Interrupts (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: Zwane Mwaikambo
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-10-25 08:24 UTC by Rafael J. Wysocki
Modified: 2005-08-10 04:47 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.10-rc1
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
dmesg output for 2.6.9-mm1 (without noapic) (14.73 KB, text/plain)
2004-10-25 12:54 UTC, Rafael J. Wysocki
Details
/proc/interrupts for 2.6.9-mm1 (without noapic) (570 bytes, text/plain)
2004-10-25 12:56 UTC, Rafael J. Wysocki
Details
dmesg output for 2.6.10-rc1-mm2 (right after failure) (16.41 KB, text/plain)
2004-10-31 09:24 UTC, Rafael J. Wysocki
Details
/proc/interrupts for 2.6.10-rc1-mm2 (right after failure) (584 bytes, text/plain)
2004-10-31 09:25 UTC, Rafael J. Wysocki
Details
dont ignore timer override (762 bytes, patch)
2004-10-31 15:15 UTC, Zwane Mwaikambo
Details | Diff
Oops trace for 2.6.9-rc4-mm1 (3.07 KB, text/plain)
2004-11-14 08:14 UTC, Rafael J. Wysocki
Details
The output of acpidmp, Linux 2.6.10-rc2-mm1 (116.84 KB, text/plain)
2004-11-17 13:10 UTC, Rafael J. Wysocki
Details
/proc/interrupts for 2.6.11-rc5-mm1 (with noapic) (599 bytes, text/plain)
2005-03-01 04:42 UTC, Rafael J. Wysocki
Details

Description Rafael J. Wysocki 2004-10-25 08:24:46 UTC
Distribution: SuSE 9.1 x86_64  
Hardware Environment: Laptop, Asus, Athlon 64 + NForce3, 512 MB RAM, HDD  
IC25N060ATMR04-0 (Hitachi?), Yukon Gigabit Ethernet 10/100/1000Base-T Adapter,  
USB Controller: nVidia Corporation nForce3 USB 1.1 (rev a5), USB Controller:  
nVidia Corporation nForce3 USB 2.0 (rev a2), Multimedia audio controller:  
nVidia Corporation nForce3 Audio (rev a2)  
Software Environment: SuSE 9.1 x86_64 + linux-2.6.10-rc1 (x86_64)  
Problem Description: After some time in X, USB suddenly stops working and  
either sound goes off or the ethernet adapter stops working simultaneously  
(both share IRQs with the USB controller)  
  
Steps to reproduce: Boot the kernel, start X+KDE, work for some time using a 
USB mouse
Comment 1 Rafael J. Wysocki 2004-10-25 12:54:19 UTC
Created attachment 3887 [details]
dmesg output for 2.6.9-mm1 (without noapic)
Comment 2 Rafael J. Wysocki 2004-10-25 12:56:26 UTC
Created attachment 3888 [details]
/proc/interrupts for 2.6.9-mm1 (without noapic)
Comment 3 Zwane Mwaikambo 2004-10-31 07:17:58 UTC
Could you also attach /proc/interrupts and dmesg for the failing kernel too for
easy reference. Thanks
Comment 4 Rafael J. Wysocki 2004-10-31 08:15:43 UTC
The two that I've already attached are for the failing case (not right after 
the failure, if you mean it, but this very kernel with this very command line 
has failed for a couple of times on my box).  I'll try to get some 'right 
after the failure" logs for a more recent kernel, though. 
 
Comment 5 Rafael J. Wysocki 2004-10-31 09:24:38 UTC
Created attachment 3920 [details]
dmesg output for 2.6.10-rc1-mm2 (right after failure)

The "segfault" lines come from initscripts, but I don't know what exactly
causes them to appear.
Comment 6 Rafael J. Wysocki 2004-10-31 09:25:33 UTC
Created attachment 3921 [details]
/proc/interrupts for 2.6.10-rc1-mm2 (right after failure)
Comment 7 Zwane Mwaikambo 2004-10-31 15:15:34 UTC
Created attachment 3922 [details]
dont ignore timer override

Could you please test this patch.
Comment 8 Rafael J. Wysocki 2004-11-01 08:19:20 UTC
It doesn't help (tested on 2.6.10-rc1-mm2).
Comment 9 Zwane Mwaikambo 2004-11-03 13:47:24 UTC
Ok can you state for me which kernel is the last known working one and what the
kernel commandline was for it. Thanks.
Comment 10 Rafael J. Wysocki 2004-11-03 14:40:51 UTC
The last working kernel that I tested was 2.6.9 and the command line for it 
was: root=/dev/hdc6 vga=792 resume=/dev/hdc3 pci=routeirq nmi_watchdog=0 
console=ttyS0,57600 console=tty0 
 
I haven't tested any -bk kernels after 2.6.9 and prior to 2.6.9-mm1, but the 
2.6.9-mm1 kernel fails for me with the above command line. 
 
Comment 11 Zwane Mwaikambo 2004-11-13 14:00:06 UTC
That's curious, because i'm fairly certain that it was the following patch which
broke it, which was introduced within your time frame.

http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.9-rc4/2.6.9-rc4-mm1/broken-out/fix-ioapic-on-nvidia-boards.patch

Could you please test 2.6.9-rc4 with that patch backed out?

Thanks!
Comment 12 Zwane Mwaikambo 2004-11-13 14:01:32 UTC
I meant 2.6.9-rc4-mm1 of course, please test with and without the patch.
Comment 13 Rafael J. Wysocki 2004-11-14 08:14:41 UTC
Created attachment 4027 [details]
Oops trace for 2.6.9-rc4-mm1

Well, it's a bit tricky.  2.6.9-rc4-mm1 oopses for me at KDE startup (always)
with a trace similar to the attached one (both with and without the patch). 
However, if I boot it with pci=routeirq _and_ the patch is _reversed_, it
effectively assumes noapic (ie it uses XT-PIC instead of APIC).  If I boot it
_without_ pci=routeirq, it uses APIC anyway.
Comment 14 Zwane Mwaikambo 2004-11-14 16:26:44 UTC
You can edit profile_hit() to be a nop, just remove the code within the
function. Also a few things for you to try, enable MPS 1.4 (instead of 1.1) in
BIOS and run the following
http://people.redhat.com/zaitcev/linux/mptable-2.0.15a-1.i386.rpm program and
send the output. Also if possible, could you get output similar to the following
from your kernel boot?

IRQ to pin mappings:
IRQ0 -> 0:0
IRQ1 -> 0:1
IRQ2 -> 0:2
IRQ3 -> 0:3
...
IRQ22 -> 0:22
Comment 15 Rafael J. Wysocki 2004-11-16 10:05:56 UTC
OK 
Should I do it for 2.6.9-rc4-mm1, or can I choose something newer (eg that 
doesn't oops)?  If so, will 2.6.10-rc2-mm1 be fine? 
 
Comment 16 Zwane Mwaikambo 2004-11-16 14:38:34 UTC
Please try 2.6.9-rc4-mm1 and edit arch/x86_64/kernel/time.c:profile_hit and
remove all the code within it and make it simply return 0. Would you prefer a
patch for that too?
Comment 17 Len Brown 2004-11-17 12:40:00 UTC
dmesg:
 >>> ERROR: Invalid checksum

please verify that you're running the latest BIOS.

then please attach the output from acpidmp, available in /usr/sbin
or in pmtools here:
http://ftp.kernel.org/pub/linux/kernel/people/lenb/acpi/utils
Comment 18 Rafael J. Wysocki 2004-11-17 13:10:03 UTC
Created attachment 4068 [details]
The output of acpidmp, Linux 2.6.10-rc2-mm1

Referring to Comment #17:

AFAIK, there are no BIOS upgrades for my box, but I wouldn't upgrade anyway
(it's a notebook and I have no replacement ;-)). Sorry.  The output of acpidmp
is attached.
Comment 19 Rafael J. Wysocki 2004-11-17 13:36:24 UTC
Referring to Comment #16: 
 
Hm, I can't find the profile_hit() function in arch/x86_64/kernel/time.c for 
2.6.9-rc4-mm1 (vanilla tree).  Actually 'grep -r profile_hit arch/x86_64/*' 
gives me nothing ...  Patch, please? 
 
Comment 20 Zwane Mwaikambo 2004-11-17 14:00:02 UTC
Sorry i meant profile_pc
Comment 21 Rafael J. Wysocki 2005-01-13 08:29:14 UTC
Sorry for stalling this, but I've been working in urgent mode for quite some 
time.  Is it still relevant or should I check a newer kernel (if so, which 
one)? 
 
Comment 22 Zwane Mwaikambo 2005-01-13 08:40:46 UTC
Perhaps just test the latest 2.6-mm kernel and we can start again from there.
Comment 23 Rafael J. Wysocki 2005-01-15 06:08:38 UTC
It appears that 2.6.11-rc1-mm1 has the same symptoms.  Now I'm supposed to 
convert arch/x86_64/kernel/time.c:profile_pc() into a noop, remove "noapic" 
from the kernel command line and see what happens, right? 
 
 
Comment 24 Zwane Mwaikambo 2005-01-16 20:49:48 UTC
No, leave profile_pc alone, that was for the crash you had in older kernels
(which has been fixed now). Please try 'noapic'
Comment 25 Rafael J. Wysocki 2005-01-17 03:58:32 UTC
Well, you lost me. 
 
Let me say once again what the situation is right now (ie in 2.6.10-rc1-mm1): 
when I boot with "noapic", everything's fine and dandy, but if I don't boot 
with "noapic", the USB ceases to work after some time and it causes problems 
with the sound chip, then.  Is that what you asked me to verify? 
 
Comment 26 Zwane Mwaikambo 2005-01-22 17:08:35 UTC
Yes, that's what i wanted you to verify. I very much suspect that it's a BIOS
issue. Unless you can narrow down a 2.6 version which works with the IOAPIC.
Comment 27 Rafael J. Wysocki 2005-01-23 02:04:00 UTC
Oh, I'm sure it is a BIOS issue, but you wanted to debug it some time ago. :-) 
That's why I had created this bugzilla entry and that's why asked if it was 
still relevant.  I can live with it just fine. 
Comment 28 Rafael J. Wysocki 2005-03-01 04:42:31 UTC
Created attachment 4621 [details]
/proc/interrupts for 2.6.11-rc5-mm1 (with noapic)

I have upgraded the BIOS. ;-)

The issue remains but now I can say it's related to the sound chip.  Namely,
the old BIOS used to place the sound chip and ohci_hcd at the same IRQ.  Then,
when IO-APIC was used, ohci_hcd and the sound chip had problems.  Now, the
sound chip shares the IRQ with the network adapter (and FireWire, but I don't
use it), and these devices have problems when IO-APIC is used (with noapic they
work just fine).

Do you think I should talk to the ALSA people?
Comment 29 Tor-bj 2005-04-08 06:22:44 UTC
Hello,
I'm having the same problem on Ubuntu 5.04 with about the same machine, is there
anything I can do to help debug this?

Thanks for a nice kernel!
Comment 30 Zwane Mwaikambo 2005-04-12 05:19:34 UTC
Which kernel version does that Ubuntu version have? 2.6.11 should be working
with nforce3 and IOAPIC.
Comment 31 Rafael J. Wysocki 2005-04-13 10:01:15 UTC
Well, I've just tested 2.6.12-rc2-mm3.  In the APIC mode it starts properly, 
but after some time (usually when the network adapter is heavily loaded) the 
network adapter (sk98lin) and the sound chip (intel8x0 on NForce3) stop working 
(to make them work again I have to reload the sound driver and restart the 
network interface). 
 
This does not occur in the PIC mode. 
 
Comment 32 Tor-bj 2005-04-16 18:11:27 UTC
It runs 2.6.10. I'll report back when I have tried a newer kernel.
Comment 33 Tor-bj 2005-04-18 15:11:01 UTC
I posted this in the ubuntu-bugzilla as
https://bugzilla.ubuntu.com/show_bug.cgi?id=7502
Comment 34 Zwane Mwaikambo 2005-08-07 13:52:13 UTC
I don't think we'll get anywhere here, so i'mm close it.
Comment 35 Tor-bj 2005-08-08 06:25:23 UTC
May this comment be relevant?
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=159078#c16
Comment 36 Zwane Mwaikambo 2005-08-09 06:58:47 UTC
No, that's what i originally suspected as that was the fix that was submitted
within his failing time frame.
http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.9-rc4/2.6.9-rc4-mm1/broken-out/fix-ioapic-on-nvidia-boards.patch
Comment 37 Tor-bj 2005-08-10 04:47:31 UTC
Ok, thanks for your time. For what it's worth, I didn't have this problem with
the 2.6.9-kernel.

Note You need to log in before you can comment on or make changes to this bug.