Bug 3639
Summary: | NForce3 problem with IOAPIC | ||
---|---|---|---|
Product: | ACPI | Reporter: | Rafael J. Wysocki (rjwysocki) |
Component: | Config-Interrupts | Assignee: | Zwane Mwaikambo (zwane) |
Status: | REJECTED WILL_NOT_FIX | ||
Severity: | normal | CC: | acpi-bugzilla, torrb |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.10-rc1 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: |
dmesg output for 2.6.9-mm1 (without noapic)
/proc/interrupts for 2.6.9-mm1 (without noapic) dmesg output for 2.6.10-rc1-mm2 (right after failure) /proc/interrupts for 2.6.10-rc1-mm2 (right after failure) dont ignore timer override Oops trace for 2.6.9-rc4-mm1 The output of acpidmp, Linux 2.6.10-rc2-mm1 /proc/interrupts for 2.6.11-rc5-mm1 (with noapic) |
Description
Rafael J. Wysocki
2004-10-25 08:24:46 UTC
Created attachment 3887 [details]
dmesg output for 2.6.9-mm1 (without noapic)
Created attachment 3888 [details]
/proc/interrupts for 2.6.9-mm1 (without noapic)
Could you also attach /proc/interrupts and dmesg for the failing kernel too for easy reference. Thanks The two that I've already attached are for the failing case (not right after the failure, if you mean it, but this very kernel with this very command line has failed for a couple of times on my box). I'll try to get some 'right after the failure" logs for a more recent kernel, though. Created attachment 3920 [details]
dmesg output for 2.6.10-rc1-mm2 (right after failure)
The "segfault" lines come from initscripts, but I don't know what exactly
causes them to appear.
Created attachment 3921 [details]
/proc/interrupts for 2.6.10-rc1-mm2 (right after failure)
Created attachment 3922 [details]
dont ignore timer override
Could you please test this patch.
It doesn't help (tested on 2.6.10-rc1-mm2). Ok can you state for me which kernel is the last known working one and what the kernel commandline was for it. Thanks. The last working kernel that I tested was 2.6.9 and the command line for it was: root=/dev/hdc6 vga=792 resume=/dev/hdc3 pci=routeirq nmi_watchdog=0 console=ttyS0,57600 console=tty0 I haven't tested any -bk kernels after 2.6.9 and prior to 2.6.9-mm1, but the 2.6.9-mm1 kernel fails for me with the above command line. That's curious, because i'm fairly certain that it was the following patch which broke it, which was introduced within your time frame. http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.9-rc4/2.6.9-rc4-mm1/broken-out/fix-ioapic-on-nvidia-boards.patch Could you please test 2.6.9-rc4 with that patch backed out? Thanks! I meant 2.6.9-rc4-mm1 of course, please test with and without the patch. Created attachment 4027 [details]
Oops trace for 2.6.9-rc4-mm1
Well, it's a bit tricky. 2.6.9-rc4-mm1 oopses for me at KDE startup (always)
with a trace similar to the attached one (both with and without the patch).
However, if I boot it with pci=routeirq _and_ the patch is _reversed_, it
effectively assumes noapic (ie it uses XT-PIC instead of APIC). If I boot it
_without_ pci=routeirq, it uses APIC anyway.
You can edit profile_hit() to be a nop, just remove the code within the function. Also a few things for you to try, enable MPS 1.4 (instead of 1.1) in BIOS and run the following http://people.redhat.com/zaitcev/linux/mptable-2.0.15a-1.i386.rpm program and send the output. Also if possible, could you get output similar to the following from your kernel boot? IRQ to pin mappings: IRQ0 -> 0:0 IRQ1 -> 0:1 IRQ2 -> 0:2 IRQ3 -> 0:3 ... IRQ22 -> 0:22 OK Should I do it for 2.6.9-rc4-mm1, or can I choose something newer (eg that doesn't oops)? If so, will 2.6.10-rc2-mm1 be fine? Please try 2.6.9-rc4-mm1 and edit arch/x86_64/kernel/time.c:profile_hit and remove all the code within it and make it simply return 0. Would you prefer a patch for that too? dmesg: >>> ERROR: Invalid checksum please verify that you're running the latest BIOS. then please attach the output from acpidmp, available in /usr/sbin or in pmtools here: http://ftp.kernel.org/pub/linux/kernel/people/lenb/acpi/utils Created attachment 4068 [details] The output of acpidmp, Linux 2.6.10-rc2-mm1 Referring to Comment #17: AFAIK, there are no BIOS upgrades for my box, but I wouldn't upgrade anyway (it's a notebook and I have no replacement ;-)). Sorry. The output of acpidmp is attached. Referring to Comment #16: Hm, I can't find the profile_hit() function in arch/x86_64/kernel/time.c for 2.6.9-rc4-mm1 (vanilla tree). Actually 'grep -r profile_hit arch/x86_64/*' gives me nothing ... Patch, please? Sorry i meant profile_pc Sorry for stalling this, but I've been working in urgent mode for quite some time. Is it still relevant or should I check a newer kernel (if so, which one)? Perhaps just test the latest 2.6-mm kernel and we can start again from there. It appears that 2.6.11-rc1-mm1 has the same symptoms. Now I'm supposed to convert arch/x86_64/kernel/time.c:profile_pc() into a noop, remove "noapic" from the kernel command line and see what happens, right? No, leave profile_pc alone, that was for the crash you had in older kernels (which has been fixed now). Please try 'noapic' Well, you lost me. Let me say once again what the situation is right now (ie in 2.6.10-rc1-mm1): when I boot with "noapic", everything's fine and dandy, but if I don't boot with "noapic", the USB ceases to work after some time and it causes problems with the sound chip, then. Is that what you asked me to verify? Yes, that's what i wanted you to verify. I very much suspect that it's a BIOS issue. Unless you can narrow down a 2.6 version which works with the IOAPIC. Oh, I'm sure it is a BIOS issue, but you wanted to debug it some time ago. :-) That's why I had created this bugzilla entry and that's why asked if it was still relevant. I can live with it just fine. Created attachment 4621 [details]
/proc/interrupts for 2.6.11-rc5-mm1 (with noapic)
I have upgraded the BIOS. ;-)
The issue remains but now I can say it's related to the sound chip. Namely,
the old BIOS used to place the sound chip and ohci_hcd at the same IRQ. Then,
when IO-APIC was used, ohci_hcd and the sound chip had problems. Now, the
sound chip shares the IRQ with the network adapter (and FireWire, but I don't
use it), and these devices have problems when IO-APIC is used (with noapic they
work just fine).
Do you think I should talk to the ALSA people?
Hello, I'm having the same problem on Ubuntu 5.04 with about the same machine, is there anything I can do to help debug this? Thanks for a nice kernel! Which kernel version does that Ubuntu version have? 2.6.11 should be working with nforce3 and IOAPIC. Well, I've just tested 2.6.12-rc2-mm3. In the APIC mode it starts properly, but after some time (usually when the network adapter is heavily loaded) the network adapter (sk98lin) and the sound chip (intel8x0 on NForce3) stop working (to make them work again I have to reload the sound driver and restart the network interface). This does not occur in the PIC mode. It runs 2.6.10. I'll report back when I have tried a newer kernel. I posted this in the ubuntu-bugzilla as https://bugzilla.ubuntu.com/show_bug.cgi?id=7502 I don't think we'll get anywhere here, so i'mm close it. May this comment be relevant? https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=159078#c16 No, that's what i originally suspected as that was the fix that was submitted within his failing time frame. http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.9-rc4/2.6.9-rc4-mm1/broken-out/fix-ioapic-on-nvidia-boards.patch Ok, thanks for your time. For what it's worth, I didn't have this problem with the 2.6.9-kernel. |