Bug 1530

Summary: system hang in APIC mode when we install SuSe 9.0 beta 3 on VIA platform
Product: ACPI Reporter: Hurry Lin (hurrylin)
Component: Config-InterruptsAssignee: Len Brown (lenb)
Status: CLOSED CODE_FIX    
Severity: blocking CC: acpi-bugzilla, andi-bz, herbert
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: SuSE 9.0 AMD x86-64 Beta 3 build Subsystem:
Regression: --- Bisected commit-id:
Attachments: More detailed descriptions about above problems and our analysis
the log file gotten from the beta3 which we think the real problem existed in ACPI kernel
IRQ info from "winmsd" run on WinXPSP1 on same hardware with the same bios(1425))
Executing command "lspci -v" with BIOS will not disable that workaround for SuSE 9.0 final version
Acpidump using dynamic routing method for SuSE 9.0 final version.
dmesg
/proc/interrupts
For your reference, this patch can fix system hang issue from bug 1581.
2.6.6 patch
2.4.26 patch
2.6.6 patch -- take2
2.4.26 patch -- take 2
dmesg

Description Hurry Lin 2003-11-11 22:24:31 UTC
Distribution: VIA K8 system

Hardware Environment: K8 CPU + VIA platform 

Software Environment:

Problem Description:
Now we meet a installation issue in APIC mode when we install SuSe 9.0 beta 3 
on VIA platform. In ACPI part of BIOS, each interrupt pin (i.e. INTA, INTB, 
INTC, INTD ) of the device will get a assigned IRQ when we will claim device's 
IRQ routing on _PRT device configuration object. If interrupt pins of the 
device is dynamically decided IRQ through device object, then the system will 
hang during installation after _CRS control method is executed by OS on the 
refered device object and the current IRQ is returned. If we directly assign 
IRQ number to each interrupt pin of the device, the installation is OK. We 
think some problems on ACPI parsing duing installation.

We also find another problem about the behavior of different OS version. For 
SuSe 9.0 beta 3, the devices can operate in APIC mode (above IRQ 16) when we 
directly assign IRQ number to each interrupt pin of the device. The system will 
hang when interrupt pins of the device is dynamically decided IRQ through 
device object. For SuSe 9.0 beta 6 and final version, the device always only 
operate in PIC mode. Customers can't accept our chipset that the IRQ routing of 
the devices is limited operated in PIC mode. We hope the IRQ routing results of 
the PCI devices to go back SuSe 9.0 beta 3 for future SuSe OS version. Let our 
platform operate in APIC mode. We have at least chance to operate in APIC mode 
for VIA chipset when interrupt pins of the device is directly assigned to IRQ 
number. 

Steps to reproduce:
CRT is Black Screen after we select "Installation" to install SuSe 9.0 beta 6.
Comment 1 Hurry Lin 2003-11-11 22:32:26 UTC
Steps to reproduce(update):
CRT is Black Screen after we select "Installation" to install SuSe 9.0 beta 3.
Comment 2 Hurry Lin 2003-11-11 22:45:46 UTC
Created attachment 1424 [details]
More detailed descriptions about above problems and our analysis
Comment 3 Hurry Lin 2003-11-11 22:48:35 UTC
Created attachment 1425 [details]
the log file gotten from the beta3 which we think the real problem existed in ACPI kernel
Comment 4 Hurry Lin 2003-11-11 22:54:54 UTC
Created attachment 1426 [details]
IRQ info from "winmsd" run on WinXPSP1 on same hardware with the same bios(1425))
Comment 5 Len Brown 2003-11-12 00:35:59 UTC
The 1st attachment describes two methods of assigning interrupts, dynamic and static. 
1. _PRT uses link devices in APIC mode, eg ALKD for the 4 USB interrupts. 
2. _PRT uses static IRQs for slot1. 
 
I assume that both of these snippets are drawn from the current BIOS, rather than different 
versions of the BIOS? 
 
Please attach the complete output from acpidmp for the current BIOS so I can better understand 
what it is doing. 
 
I don't understand why the code for ALKD exists. 
_PRS says that it can only be programmed to IRQ21. 
                    Interrupt(ResourceConsumer,Level,ActiveLow,Shared) {21} 
Why have a programable device that can only be programmed to 1 value?  
--- 
Re: different versions of SuSE linux 
 
> For SuSe 9.0 beta 3, the devices can operate in APIC mode (above IRQ 16) 
> when we directly assign IRQ number to each interrupt pin of the device. 
> The system will hang when interrupt pins of the device is dynamically 
> decided IRQ through device object. 
 
I believe that Andi put a workaround into SuSE Linux 9.0 that recognizes your chip-set and forces 
the kernel to run in PIC mode.  Changing the BIOS will not disable that workaround.  Please 
attach the output of lspci -v, and we can use that to confirm. 
 
It would be ideal if you could reproduce this problem using the latest 2.4 or 2.6 baseline kernel 
instead of SuSE Linux -- is that possible? 
 
It sounds like the problem exists only when PCI link devices are used in APIC mode, 
and that the problem does not exists if you use a debug BIOS that assigns static IRQs. 
 
While this may be a bug in Linux, I guess I'll have to repeat my question of why you 
use PCI link devices in APIC mode and why static routing is not sufficient? 
 
> In SuSe 9.0 beta 3 installation, we use original bios that internal devices
> will 
> refer device object of interrupt pin to determine its IRQ number. 
>  We debug the control methods of device objects refered by internal devices. 
> We find that OS execute the internal devices
Comment 6 Len Brown 2003-11-12 01:06:34 UTC
Re: PIC mode workaround... 
The latest 2.4, 2.6, and SL9 x86_64 kernels will disable the APIC whenver they recognize a VIA 
chip-set.  This is a temporarly workaround until this problem is fixed.  To disable the workaround 
and boot in APIC mode, you'll need to use cmdline option "apic". 
 
Comment 7 Hurry Lin 2003-11-18 19:21:31 UTC
Created attachment 1471 [details]
Executing command "lspci -v" with BIOS will not disable that workaround for SuSE 9.0 final version
Comment 8 Hurry Lin 2003-11-18 19:23:37 UTC
Created attachment 1472 [details]
Acpidump using dynamic routing method for SuSE 9.0 final version.
Comment 9 Hurry Lin 2003-11-18 19:40:24 UTC
>I don't understand why the code for ALKD exists. 
>_PRS says that it can only be programmed to IRQ21. 
>                    Interrupt(ResourceConsumer,Level,ActiveLow,Shared) {21} 
>Why have a programable device that can only be programmed to 1 value?  

>While this may be a bug in Linux, I guess I'll have to repeat my question of 
why you 
>use PCI link devices in APIC mode and why static routing is not sufficient? 

For PCI slots, we always use static routing method. For internal PCI devices, 
We need to use dynamic IRQ routing method in current bios because we need 
compatible for all VIA chipsets. For older chipsets, internal PCI devices must 
be programmed to IRQs(below 16). For newer chipsets, internal devices are 
hardwired to fixed IRQ been not below 16. We can use static routing method but 
we must divide two bioses for two kinds of chipsets. But this should be last 
patch method if we can't find the better solution.

>The 1st attachment describes two methods of assigning interrupts, dynamic and 
static. 
>1. _PRT uses link devices in APIC mode, eg ALKD for the 4 USB interrupts. 
>2. _PRT uses static IRQs for slot1.

For newer chipsets, two methods will generate the same IRQ 21 to USB controller 
in APIC mode for Windows. But we must adopt static routing method to pass SuSE 
installation. Now we just use a bios adopting dynamic routing method at the 
same time for the two kinds of VIA chipsets. And this is just one example of 
all internal PCI devices.

>Please attach the complete output from acpidmp for the current BIOS so I can 
better understand 
>what it is doing. 

I have add acpidump with the bios using dynamic routing method for SuSE 9.0 
final version. 

>I believe that Andi put a workaround into SuSE Linux 9.0 that recognizes your 
chip-set and >forces 
>the kernel to run in PIC mode.  Changing the BIOS will not disable that 
workaround.  Please 
>attach the output of lspci -v, and we can use that to confirm. 

I have attached a text file "lspci.txt" Changing the BIOS will not disable that 
workaround for SuSE 9.0 final version..

>If I read the XML from winmsd correctly, it is telling us that windows does 
this: 
 
>IRQ 0   System timer 
>IRQ 1   Standard 101/102-Key or Microsoft Natural PS/2 Keyboard 
>IRQ 3   Communications Port (COM2) 
>IRQ 4   Communications Port (COM1) 
>IRQ 6   Standard floppy disk controller 
>IRQ 8   System CMOS/real time clock 
>IRQ 9   Microsoft ACPI-Compliant System 
>IRQ 10  MPU-401 Compatible MIDI Device 
>IRQ 11  RAID Controller 
>IRQ 12  PS/2 Compatible Mouse 
>IRQ 13  Numeric data processor 
>IRQ 14  Primary IDE Channel 
>IRQ 15  Secondary IDE Channel 
>IRQ 16  NVIDIA GeForce2 GTS/GeForce2 Pro (Microsoft Corporation) 
>IRQ 21  VIA Rev 5 or later USB Universal Host Controller 
>IRQ 21  VIA Rev 5 or later USB Universal Host Controller 
>IRQ 21  VIA Rev 5 or later USB Universal Host Controller 
>IRQ 21  VIA Rev 5 or later USB Universal Host Controller 
>IRQ 21  Standard Enhanced PCI to USB Host Controller 
>IRQ 22  VIA AC'97 Enhanced Audio Controller (WDM) 
>IRQ 23  VIA Compatable Fast Ethernet Adapter 
 
>Is this from a static or dynamic IRQ routing BIOS, or does windows program the 
APIC the same 
>in both cases? 

For windows, we get the same results in both cases.

>It would be ideal if you could reproduce this problem using the latest 2.4 or 
2.6 baseline kernel 
>instead of SuSE Linux -- is that possible? 

I execute command "uname -r" to get information "2.4..21-102-default" for SuSE 
9.0 final version.
Is SuSE 9.0 final version 2.4 baseline kernel? If not, where can we get this 
kind of OS from?

Comment 10 Andi Kleen 2003-11-18 19:59:21 UTC
Additional comment: you can verify if you passed the apic parameter
correctly by checking /proc/cmdline in the running system. If there is
no 

apic

in there you passed it incorrectly and it cannot work.

Can you check if that is the case?
Comment 11 Hurry Lin 2003-11-19 01:58:18 UTC
>you can verify if you passed the apic parameter
>correctly by checking /proc/cmdline in the running system. If there is no apic
>in there you passed it incorrectly and it cannot work.

For the attachments (acpidmp, lspci.txt), we use dynamic IRQ routing method 
without "apic" on the bootloader command line. So this running system that apic 
is not in "/proc/cmdline" is on PIC mode. The system still hang when we use 
dynamic routing method with "apic" on the bootloader command line. 
Comment 12 Georg Greve 2004-01-22 06:16:22 UTC
Related to

http://bugzilla.kernel.org/show_bug.cgi?id=1774

?
Comment 13 Hurry Lin 2004-03-04 22:47:37 UTC
According to Bug 1774, I have updated kernel linux-2.6.2-rc2.tar.bz2 for SuSe 
9.0 beta 6 and added the patch but the system still hang on the same point for 
booting. 
Comment 14 Len Brown 2004-04-23 21:35:41 UTC
Please apply the 2.6.5 debug patch from bug 1581 and attach the resulting 
dmesg and /proc/interrupts using the "dynamic" PCI Interrupt Link 
version of the BIOS.  The patch ignores the device's enable bit (_STA), 
and also prints out some more info about the link. 
 
Comment 15 Hurry Lin 2004-04-30 06:46:46 UTC
When we have applied kernel linux 2.6.5 and added the last patch from bug 1581, 
the system has successfully booted with the "dynamic" PCI Interrupt Link 
version of the BIOS.
Comment 16 Hurry Lin 2004-04-30 06:51:10 UTC
Created attachment 2761 [details]
dmesg
Comment 17 Hurry Lin 2004-04-30 06:52:03 UTC
Created attachment 2762 [details]
/proc/interrupts
Comment 18 Hurry Lin 2004-05-04 01:43:23 UTC
After we apply kernel 2.6.5 with the proposed patch from bug 1581, we can fix 
system hang issue with bios using "dynamic" IRQ routing method.

Thanks for your kindly support.

Can we know what the kernel version will add this patch code?

And, when you add this patch code into kernel at the same time, can you remove 
workaround that will disable the APIC whenver they recognize a VIA chipset? 
Comment 19 Hurry Lin 2004-05-04 19:26:09 UTC
Created attachment 2791 [details]
For your reference, this patch can fix system hang issue from bug 1581.
Comment 20 Len Brown 2004-05-13 23:18:10 UTC
ACPI: _CRS 11 not found in _PRS 
LENB: extended resource for 20 
pci_link-0417 [30] acpi_pci_link_set     : Enabled IRQ 20, BIOS reported IRQ 11, using IRQ 20 
ACPI: PCI Interrupt Link [ALKA] enabled at IRQ 20 
IOAPIC[0]: Set PCI routing entry (2-20 -> 0xc9 -> IRQ 20 Mode:1 Active:1) 
00:00:0f[A] -> 2-20 -> IRQ 20 
 
This output shows that the VIA problem is a duplicate of bug 2567, 
which is fixed as part of the patch in bug 1581 (which is in linux 2.6.6 and 2.4.27) 
 
Ie. This BIOS bug is that _CRS is returning an IRQ that is not listed in _PRS. 
In this case, even after _SRS 20, _CRS still returns 11.  In the past 
we believed _CRS, now we just print a warning and ignore it. 
 
Please build 2.6.6 (or 2.6.5+patch above)  with a big msgbuf: 
CONFIG_LOG_BUF_SHIFT=16 
attach the output from dmesg -s64000 
It will tell us if Linux sees any other issues with this BIOS. 
 
thanks, 
-Len 
 
Comment 21 Len Brown 2004-05-13 23:26:05 UTC
Created attachment 2856 [details]
2.6.6 patch

this patch removes the x86_64 IO-APIC disable workaround for both VIA and
NVIDIA.
check_ioapic() is not completely deleted b/c it looks like it is still used for
an iommu
workaround.
Comment 22 Len Brown 2004-05-13 23:30:45 UTC
Created attachment 2857 [details]
2.4.26 patch

remove VIA/NVIDIA disable IOAPIC x86_64 workaround in 2.4.
As this is the only use of check_ioapic(), that routine is removed.
Comment 23 Andi Kleen 2004-05-13 23:40:26 UTC
I don't think the patch is correct - afaik nvidia is still broken 
and needs the workaround.

disabling it for via would be probably possible now, although we still
need the iommu workaround
Comment 24 Len Brown 2004-05-14 19:59:08 UTC
Created attachment 2868 [details]
2.6.6 patch -- take2

updated 2.6.6 patch:
VIA: retain IOMMU disable, delete IOAPIC disable
NVIDIA: retain IOAPIC disable
Comment 25 Len Brown 2004-05-14 20:01:17 UTC
Created attachment 2869 [details]
2.4.26 patch -- take 2

updated patch:
VIA: delete IOAPIC disable (and leave a spot for adding IOMMU disable if
somebody wants to)
NVIDIA: retain IOAPIC disable
Comment 26 Hurry Lin 2004-05-20 04:34:29 UTC
Created attachment 2922 [details]
dmesg

The attachment is dmesg that We rebuild 2.6.6 with a big msgbuf:
CONFIG_LOG_BUF_SHIFT=16.
Comment 27 Hurry Lin 2004-05-20 04:58:18 UTC
We have tried to rebuild kernel 2.6.6 With the patch that retain IOMMU disable 
and delete IOAPIC disable for VIA chipset.

The system can boot successfully and work in APIC mode without "apic" on the 
bootloader command line.

So Can we know what the kernel version will add this patch code that delete 
IOAPIC disable for VIA chipset? 

Thanks!


Comment 28 Andi Kleen 2004-05-20 05:01:35 UTC
Did you test it on all VIA chipsets that support x86-64 and 
possible Intel Prescott? (for EM64T CPUs - the later can be done
with 32bit kernels too as long as you make sure the APIC was enabled) 

If that all works I can remove the test for VIA chipset for 2.6.7.
Comment 29 Len Brown 2004-05-20 21:06:40 UTC
This is checked in on top of 2.4.27-pre3 
and will also be in 2.6.7-rc1 
 
closing.  If regressions are seen we'll back this out and re-open the bug report.