Most recent kernel where this bug did not occur: 2.6.17 Distribution: SuSE 10.1 Hardware Environment: Athlon XP 1.6 GHz, MSI motherboard (MS-6390-L v1.0) which features the VIA KM266 chipset. The motherboard features an onboard RTL8139 network card, AC97/8233A VIA sound and south bridge, and also an integrated S3 Savage4-PRO+ 266DDR. 512 MB RAM. Software Environment: SuSE/Ubuntu/Arch Linux. I think that only Fedora works, but not sure, it has been a while since I tried Fedora in this machine. Problem Description: The kernel will boot only to safe mode and won't mount any partitions (suse). With other distros it will boot, but no pci hardware will work (ubuntu, Arch). All these problems are going away only if you pass the "nolapic" kernel parameter. But I believe that the user experience is not good if the user has to pass kernel parameters, IMHO this is something that must be fixed. Steps to reproduce: Just try to install a recent distro on this machine. boot message of the suse dvd with apic=debug https://bugzilla.novell.com/attachment.cgi?id=92035&action=view Dmesg of suse https://bugzilla.novell.com/attachment.cgi?id=92036&action=view https://bugzilla.novell.com/attachment.cgi?id=93759&action=view acpi dump logs tarball: acpi.txt is from acpidmp, and acpidump.txt us from acpidump Novell won't look at the problem, so you are my only hope. This is probably a BIOS bug, but the point is that Windows works (including Vista), BeOS and BSD works, and even kernel 2.4 works perfectly. So I think that this problem should be fixed as kernel 2.6 is the only one that misbehaves, and also because older hardware is what to expect in most Enterprise businesses today. Thanks.
you say: > Most recent kernel where this bug did not occur: 2.6.17 so the upstream kernel is fine? In particular, could you test 2.6.19-rc5-mm1? (that has a couple of APIC fixes)
I am sorry, but I guess I misread when i said that it did not occur on 2.6.17. The bug did happen with 2.6.17. I don't know if it still happens or not, I will have to wait for a distro to try that has these versions of the kernel in it, I won't manually build one...
This Bugzilla is only for bugs in unmodified kernels from ftp.kernel.org. If you are only using distribution kernels and not ftp.kernel.org kernels, please ask your distribution for support. The rationale for this is: - the bug might be cased by a patch in the distribution kernel - if you are not able to test patches, it's much harder to find a solution
I think the bug is part of the mainstream kernel, because it happens with all the distros I tried (suse, ubuntu, arch, fedora).
This still leaves my second point that you will not be able to test patches. If someone thinks he has found the solution for your problem he will create a patch - and how can you ever verify whether such a patch really fixes your problem?
http://wiki.fini.net/bin/view/Support/LinuxLosingPCIonMS6390 is a page I created discussing this issue. I am happy to test patches.
This is a uni-processor board with a LAPIC/IOAPIC. In the past, the distros supported these boards with special uni-processor kernels that disabled the IOAPIC, but recently they've cut over to uni-processor kernels that support LAPIC/IOAPIC, and finally SMP kernels by default. The issue is likely the IOAPIC, not the LAPIC. Please verify that "noapic" is a sufficient workaround rather than "nolapic", and attach the dmesg and /proc/interrupts from a "noapic" boot if it works. Skip FC4, FC5 and SL10.1. Try FC6 or OpenSuSE 10.2, which will install directly to 2.6.18. (Indeed, FC6 will then net update to 2.6.19 -- and I'm told it is very close to upstream) If you can get a recent upstream kernel.org kernel to fail then go ahead and re-open this bug report. The first thing I'll be looking for is a dump of the interrupts from Christopher's W2K boot to show that Windows can handle the IOAPIC on this board.
Created attachment 10124 [details] /proc/interrupts output from 2.6.19-1.2895.fc6 kernel
Created attachment 10125 [details] dmesg output from 2.6.19-1.2895.fc6 kernel
nolapic was superfulous. noapic was sufficient. I've attached the Linux output requested. I don't have permission, but the summary should be updated to say noapic instead of nolapic.
Thanks for verifying that "noapic" is a sufficient workaround. I see you are now running 2.6.19-1.2895.fc6 -- I assume it also fails to boot if you drop off the "noapic" parameter? Another workaround might be to use "acpi=noirq" -- as this box has an MP table. But this would be a just another workaround... It turns out that Thomas Renninger already found the root cause of this failure: https://bugzilla.novell.com/show_bug.cgi?id=179024 The _PRT entries are garbled, just like they were in bug 1164 Package (0x04) { 0x0012FFFF, 0x00, 0x00, \_SB.PCI0.LNKA }, Package (0x04) { 0x0012FFFF, 0x01, 0x00, \_SB.PCI0.LNKB }, Package (0x04) { 0x0012FFFF, 0x02, 0x00, \_SB.PCI0.LNKC }, Package (0x04) { 0x0012FFFF, 0x03, 0x00, \_SB.PCI0.LNKD The Links should be the 3rd entry, not the 4th. At the time we got the BIOS fixed, but it turns out we should have implemented a kernel workaround for this BIOS bug then because Windows continues to allow systems to ship with this bug.
Created attachment 10126 [details] patch to work-around garbled _PRT entry vs 2.6.18 Please test this patch, originally written by Shaohua Li, for http://bugzilla.kernel.org/show_bug.cgi?id=1164#c39 and forward-ported here. It should apply cleanly to 2.6.18 though 2.6.20-rc5. For 2.6.16 and 2.6.17 it should apply with a 3-line offset. For a boot with no kernel parameters, please attach the complete dmesg and paste the /proc/interrupts. To disable the workaround, you can boot with "acpi=strict".
patch in comment #12 applied to acpi-test
shipped in 2.6.21-rc3-git6 closed
There is a problem with this patch, in this line: if (ACPI_GET_OBJECT_TYPE (sub_object_list[3]) != ACPI_TYPE_INTEGER) { In the case where the SourceName and SourceIndex are reversed, if the actual SourceName was unresolved, the object will be null. That is the purpose of the null check later in the code: obj_desc = sub_object_list[source_name_index]; if (obj_desc) { For safety, a check for a null SourceName must be made. A "correct" _PRT entry will always have a valid integer object in index 3 for the SourceIndex, so the safer code would be: if (!sub_object_list[3] || (ACPI_GET_OBJECT_TYPE (sub_object_list[3]) != ACPI_TYPE_INTEGER)) { The ACPICA patch will simply swap the objects in place: /* * If the BIOS has erroneously reversed the _PRT SourceName (index 2) * and the SourceIndex (index 3), fix it. _PRT is important enough to * workaround this BIOS error. This also provides compatibility with * other ACPI implementations. */ ObjDesc = SubObjectList[3]; if (!ObjDesc || (ACPI_GET_OBJECT_TYPE (ObjDesc) != ACPI_TYPE_INTEGER)) { SubObjectList[3] = SubObjectList[2]; SubObjectList[2] = ObjDesc; ACPI_WARNING ((AE_INFO, "(PRT[%X].Source) SourceName and SourceIndex are reversed, fixed", Index)); }
bob, what's the status of this bug?
The original patch is already in Linux as far as I know. The updated patch was released in ACPICA 20080609. I'm not sure if it has been integrated into Linux.
The updated patch is in 2.6.27rc as commit d0e184abc5983281ef189db2c759d65d56eb1b80 Author: Bob Moore <robert.moore@intel.com> Date: Tue Jun 10 14:16:47 2008 +0800 ACPICA: Workaround for reversed _PRT entries from BIOS I don't have rights to close the bug unfortunately. I guess we'll have to wait for Len to do this? [Hi Len, an easy way to improve your bug numbers ;-]
Ok got rights to close the bug now.