Bug 1530
Summary: | system hang in APIC mode when we install SuSe 9.0 beta 3 on VIA platform | ||
---|---|---|---|
Product: | ACPI | Reporter: | Hurry Lin (hurrylin) |
Component: | Config-Interrupts | Assignee: | Len Brown (lenb) |
Status: | CLOSED CODE_FIX | ||
Severity: | blocking | CC: | acpi-bugzilla, andi-bz, herbert |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | SuSE 9.0 AMD x86-64 Beta 3 build | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: |
More detailed descriptions about above problems and our analysis
the log file gotten from the beta3 which we think the real problem existed in ACPI kernel IRQ info from "winmsd" run on WinXPSP1 on same hardware with the same bios(1425)) Executing command "lspci -v" with BIOS will not disable that workaround for SuSE 9.0 final version Acpidump using dynamic routing method for SuSE 9.0 final version. dmesg /proc/interrupts For your reference, this patch can fix system hang issue from bug 1581. 2.6.6 patch 2.4.26 patch 2.6.6 patch -- take2 2.4.26 patch -- take 2 dmesg |
Description
Hurry Lin
2003-11-11 22:24:31 UTC
Steps to reproduce(update): CRT is Black Screen after we select "Installation" to install SuSe 9.0 beta 3. Created attachment 1424 [details]
More detailed descriptions about above problems and our analysis
Created attachment 1425 [details]
the log file gotten from the beta3 which we think the real problem existed in ACPI kernel
Created attachment 1426 [details]
IRQ info from "winmsd" run on WinXPSP1 on same hardware with the same bios(1425))
The 1st attachment describes two methods of assigning interrupts, dynamic and static. 1. _PRT uses link devices in APIC mode, eg ALKD for the 4 USB interrupts. 2. _PRT uses static IRQs for slot1. I assume that both of these snippets are drawn from the current BIOS, rather than different versions of the BIOS? Please attach the complete output from acpidmp for the current BIOS so I can better understand what it is doing. I don't understand why the code for ALKD exists. _PRS says that it can only be programmed to IRQ21. Interrupt(ResourceConsumer,Level,ActiveLow,Shared) {21} Why have a programable device that can only be programmed to 1 value? --- Re: different versions of SuSE linux > For SuSe 9.0 beta 3, the devices can operate in APIC mode (above IRQ 16) > when we directly assign IRQ number to each interrupt pin of the device. > The system will hang when interrupt pins of the device is dynamically > decided IRQ through device object. I believe that Andi put a workaround into SuSE Linux 9.0 that recognizes your chip-set and forces the kernel to run in PIC mode. Changing the BIOS will not disable that workaround. Please attach the output of lspci -v, and we can use that to confirm. It would be ideal if you could reproduce this problem using the latest 2.4 or 2.6 baseline kernel instead of SuSE Linux -- is that possible? It sounds like the problem exists only when PCI link devices are used in APIC mode, and that the problem does not exists if you use a debug BIOS that assigns static IRQs. While this may be a bug in Linux, I guess I'll have to repeat my question of why you use PCI link devices in APIC mode and why static routing is not sufficient? > In SuSe 9.0 beta 3 installation, we use original bios that internal devices > will > refer device object of interrupt pin to determine its IRQ number. > We debug the control methods of device objects refered by internal devices. > We find that OS execute the internal devices Re: PIC mode workaround... The latest 2.4, 2.6, and SL9 x86_64 kernels will disable the APIC whenver they recognize a VIA chip-set. This is a temporarly workaround until this problem is fixed. To disable the workaround and boot in APIC mode, you'll need to use cmdline option "apic". Created attachment 1471 [details]
Executing command "lspci -v" with BIOS will not disable that workaround for SuSE 9.0 final version
Created attachment 1472 [details]
Acpidump using dynamic routing method for SuSE 9.0 final version.
>I don't understand why the code for ALKD exists. >_PRS says that it can only be programmed to IRQ21. > Interrupt(ResourceConsumer,Level,ActiveLow,Shared) {21} >Why have a programable device that can only be programmed to 1 value? >While this may be a bug in Linux, I guess I'll have to repeat my question of why you >use PCI link devices in APIC mode and why static routing is not sufficient? For PCI slots, we always use static routing method. For internal PCI devices, We need to use dynamic IRQ routing method in current bios because we need compatible for all VIA chipsets. For older chipsets, internal PCI devices must be programmed to IRQs(below 16). For newer chipsets, internal devices are hardwired to fixed IRQ been not below 16. We can use static routing method but we must divide two bioses for two kinds of chipsets. But this should be last patch method if we can't find the better solution. >The 1st attachment describes two methods of assigning interrupts, dynamic and static. >1. _PRT uses link devices in APIC mode, eg ALKD for the 4 USB interrupts. >2. _PRT uses static IRQs for slot1. For newer chipsets, two methods will generate the same IRQ 21 to USB controller in APIC mode for Windows. But we must adopt static routing method to pass SuSE installation. Now we just use a bios adopting dynamic routing method at the same time for the two kinds of VIA chipsets. And this is just one example of all internal PCI devices. >Please attach the complete output from acpidmp for the current BIOS so I can better understand >what it is doing. I have add acpidump with the bios using dynamic routing method for SuSE 9.0 final version. >I believe that Andi put a workaround into SuSE Linux 9.0 that recognizes your chip-set and >forces >the kernel to run in PIC mode. Changing the BIOS will not disable that workaround. Please >attach the output of lspci -v, and we can use that to confirm. I have attached a text file "lspci.txt" Changing the BIOS will not disable that workaround for SuSE 9.0 final version.. >If I read the XML from winmsd correctly, it is telling us that windows does this: >IRQ 0 System timer >IRQ 1 Standard 101/102-Key or Microsoft Natural PS/2 Keyboard >IRQ 3 Communications Port (COM2) >IRQ 4 Communications Port (COM1) >IRQ 6 Standard floppy disk controller >IRQ 8 System CMOS/real time clock >IRQ 9 Microsoft ACPI-Compliant System >IRQ 10 MPU-401 Compatible MIDI Device >IRQ 11 RAID Controller >IRQ 12 PS/2 Compatible Mouse >IRQ 13 Numeric data processor >IRQ 14 Primary IDE Channel >IRQ 15 Secondary IDE Channel >IRQ 16 NVIDIA GeForce2 GTS/GeForce2 Pro (Microsoft Corporation) >IRQ 21 VIA Rev 5 or later USB Universal Host Controller >IRQ 21 VIA Rev 5 or later USB Universal Host Controller >IRQ 21 VIA Rev 5 or later USB Universal Host Controller >IRQ 21 VIA Rev 5 or later USB Universal Host Controller >IRQ 21 Standard Enhanced PCI to USB Host Controller >IRQ 22 VIA AC'97 Enhanced Audio Controller (WDM) >IRQ 23 VIA Compatable Fast Ethernet Adapter >Is this from a static or dynamic IRQ routing BIOS, or does windows program the APIC the same >in both cases? For windows, we get the same results in both cases. >It would be ideal if you could reproduce this problem using the latest 2.4 or 2.6 baseline kernel >instead of SuSE Linux -- is that possible? I execute command "uname -r" to get information "2.4..21-102-default" for SuSE 9.0 final version. Is SuSE 9.0 final version 2.4 baseline kernel? If not, where can we get this kind of OS from? Additional comment: you can verify if you passed the apic parameter correctly by checking /proc/cmdline in the running system. If there is no apic in there you passed it incorrectly and it cannot work. Can you check if that is the case? >you can verify if you passed the apic parameter
>correctly by checking /proc/cmdline in the running system. If there is no apic
>in there you passed it incorrectly and it cannot work.
For the attachments (acpidmp, lspci.txt), we use dynamic IRQ routing method
without "apic" on the bootloader command line. So this running system that apic
is not in "/proc/cmdline" is on PIC mode. The system still hang when we use
dynamic routing method with "apic" on the bootloader command line.
Related to http://bugzilla.kernel.org/show_bug.cgi?id=1774 ? According to Bug 1774, I have updated kernel linux-2.6.2-rc2.tar.bz2 for SuSe 9.0 beta 6 and added the patch but the system still hang on the same point for booting. Please apply the 2.6.5 debug patch from bug 1581 and attach the resulting dmesg and /proc/interrupts using the "dynamic" PCI Interrupt Link version of the BIOS. The patch ignores the device's enable bit (_STA), and also prints out some more info about the link. When we have applied kernel linux 2.6.5 and added the last patch from bug 1581, the system has successfully booted with the "dynamic" PCI Interrupt Link version of the BIOS. Created attachment 2761 [details]
dmesg
Created attachment 2762 [details]
/proc/interrupts
After we apply kernel 2.6.5 with the proposed patch from bug 1581, we can fix system hang issue with bios using "dynamic" IRQ routing method. Thanks for your kindly support. Can we know what the kernel version will add this patch code? And, when you add this patch code into kernel at the same time, can you remove workaround that will disable the APIC whenver they recognize a VIA chipset? Created attachment 2791 [details] For your reference, this patch can fix system hang issue from bug 1581. ACPI: _CRS 11 not found in _PRS LENB: extended resource for 20 pci_link-0417 [30] acpi_pci_link_set : Enabled IRQ 20, BIOS reported IRQ 11, using IRQ 20 ACPI: PCI Interrupt Link [ALKA] enabled at IRQ 20 IOAPIC[0]: Set PCI routing entry (2-20 -> 0xc9 -> IRQ 20 Mode:1 Active:1) 00:00:0f[A] -> 2-20 -> IRQ 20 This output shows that the VIA problem is a duplicate of bug 2567, which is fixed as part of the patch in bug 1581 (which is in linux 2.6.6 and 2.4.27) Ie. This BIOS bug is that _CRS is returning an IRQ that is not listed in _PRS. In this case, even after _SRS 20, _CRS still returns 11. In the past we believed _CRS, now we just print a warning and ignore it. Please build 2.6.6 (or 2.6.5+patch above) with a big msgbuf: CONFIG_LOG_BUF_SHIFT=16 attach the output from dmesg -s64000 It will tell us if Linux sees any other issues with this BIOS. thanks, -Len Created attachment 2856 [details]
2.6.6 patch
this patch removes the x86_64 IO-APIC disable workaround for both VIA and
NVIDIA.
check_ioapic() is not completely deleted b/c it looks like it is still used for
an iommu
workaround.
Created attachment 2857 [details]
2.4.26 patch
remove VIA/NVIDIA disable IOAPIC x86_64 workaround in 2.4.
As this is the only use of check_ioapic(), that routine is removed.
I don't think the patch is correct - afaik nvidia is still broken and needs the workaround. disabling it for via would be probably possible now, although we still need the iommu workaround Created attachment 2868 [details]
2.6.6 patch -- take2
updated 2.6.6 patch:
VIA: retain IOMMU disable, delete IOAPIC disable
NVIDIA: retain IOAPIC disable
Created attachment 2869 [details]
2.4.26 patch -- take 2
updated patch:
VIA: delete IOAPIC disable (and leave a spot for adding IOMMU disable if
somebody wants to)
NVIDIA: retain IOAPIC disable
Created attachment 2922 [details]
dmesg
The attachment is dmesg that We rebuild 2.6.6 with a big msgbuf:
CONFIG_LOG_BUF_SHIFT=16.
We have tried to rebuild kernel 2.6.6 With the patch that retain IOMMU disable and delete IOAPIC disable for VIA chipset. The system can boot successfully and work in APIC mode without "apic" on the bootloader command line. So Can we know what the kernel version will add this patch code that delete IOAPIC disable for VIA chipset? Thanks! Did you test it on all VIA chipsets that support x86-64 and possible Intel Prescott? (for EM64T CPUs - the later can be done with 32bit kernels too as long as you make sure the APIC was enabled) If that all works I can remove the test for VIA chipset for 2.6.7. This is checked in on top of 2.4.27-pre3 and will also be in 2.6.7-rc1 closing. If regressions are seen we'll back this out and re-open the bug report. |