Bug 2665

Summary: Re: hdc: lost interrupt ide-cd: cmd 0x3 timed out ...
Product: ACPI Reporter: Alex Riesen (raa.lkml)
Component: Config-InterruptsAssignee: Len Brown (lenb)
Status: CLOSED CODE_FIX    
Severity: normal CC: acpi-bugzilla, as, marcus_brodi, richlv
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.6-rc3-bk8 Subsystem:
Regression: --- Bisected commit-id:
Attachments: Output of acpidmp
lspci -vv
dmesg
Alt-Sysrq-T for modprobe
patch vs 2.6.5
acpidmp output on Anssi's system
Dmesg output on Anssi's system
lspci -vv output on Anssi's system
dmesg with 2.6.6 vanilla and patch from this bug report

Description Alex Riesen 2004-05-09 23:52:02 UTC
Hardware Environment: SiS961
Problem Description:
modprobing ide-cd hangs on SiS961.
Comment 1 Alex Riesen 2004-05-09 23:52:48 UTC
Created attachment 2829 [details]
Output of acpidmp
Comment 2 Alex Riesen 2004-05-09 23:54:32 UTC
Created attachment 2830 [details]
lspci -vv
Comment 3 Alex Riesen 2004-05-10 00:00:12 UTC
Created attachment 2831 [details]
dmesg
Comment 4 Alex Riesen 2004-05-10 00:00:45 UTC
Created attachment 2832 [details]
Alt-Sysrq-T for modprobe
Comment 5 Len Brown 2004-05-10 10:58:25 UTC
Created attachment 2840 [details]
patch vs 2.6.5

The problem is triggered by a BIOS bug which sets the current IRQ
to a value outside the list of possible IRQs:

ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 7 10 11 12 14 15) *9

ACPI handles this and selects a value from the possible list.
But since the active IRQ is set, Linux erroneously doesn't
look at its rules for deciding which IRQ in the list to select.
It chooses 15, which turns out to be a bad idea.

ACPI: PCI Interrupt Link [LNKF] enabled at IRQ 15

The fix is to simply forget the illegal current setting (IRQ 9)
and behave as if the BIOS didn't give us any current value.
Comment 6 Cyrille Ch 2004-05-11 23:08:15 UTC
Same problem here, in a non-SiS environment, the proposed patch also fixes the
problem for me. 

Thanks Len!


-[00]-+-00.0  VIA Technologies, Inc. VT8363/8365 [KT133/KM133]
      +-01.0-[01]----00.0  ATI Technologies Inc Radeon RV200 QW [Radeon 7500]
      +-07.0  VIA Technologies, Inc. VT82C686 [Apollo Super South]
      +-07.1  VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC
Bus Master IDE
      +-07.2  VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller
      +-07.3  VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller
      +-07.4  VIA Technologies, Inc. VT82C686 [Apollo Super ACPI]
      +-09.0  Ensoniq ES1371 [AudioPCI-97]
      +-0d.0  LSI Logic / Symbios Logic 53c810
      +-0f.0  3Com Corporation 3c905B 100BaseTX [Cyclone]
      \-11.0  3Com Corporation 3c905B 100BaseTX [Cyclone]
Comment 7 Len Brown 2004-05-13 22:37:06 UTC
*** Bug 2689 has been marked as a duplicate of this bug. ***
Comment 8 Len Brown 2004-05-17 19:45:06 UTC
integrated on top of 2.4.27-pre2 and 2.6.6 
ie. will show up in 2.4.27-pre3 and 2.6.7-rc1. 
closing. 
Comment 9 Len Brown 2004-06-23 23:16:41 UTC
*** Bug 2888 has been marked as a duplicate of this bug. ***
Comment 10 Anssi Saari 2004-06-28 01:51:07 UTC
The patch to 2.6.5 which apparently made it to 2.6.6-bk2 makes my system 
unbootable. I reported my problem on the lkml, see 
http://marc.theaimsgroup.com/?l=linux-kernel&m=108793753409268&w=2
Comment 11 Anssi Saari 2004-06-28 03:08:00 UTC
Created attachment 3264 [details]
acpidmp output on Anssi's system
Comment 12 Anssi Saari 2004-06-28 03:09:51 UTC
Created attachment 3265 [details]
Dmesg output on Anssi's system

This is from running 2.6.7 with the patch removed.
Comment 13 Anssi Saari 2004-06-28 03:10:39 UTC
Created attachment 3266 [details]
lspci -vv output on Anssi's system
Comment 14 Len Brown 2004-06-28 09:49:37 UTC
re: comment #10 
Anssi,  Re comment #10, 
the e-mail at that link mentions that the problem is unchanged with acpi=off 
Is that accurate? 
 
Comment 15 Len Brown 2004-06-28 10:22:04 UTC
lspci shows PCI-id/pin for CMD:  
00:09.0 RAID bus controller: CMD Technology Inc PCI0649 (rev 02) - pinA  
  
acpidmp DSDT shows _PRT entry uses LNKC: 
Package (0x04) { 0x0009FFFF, 0x00, \_SB.PCI0.LNKC, 0x00 }  
  
dmesg_patch_removed:  
ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 6 7 10 11 12) *5  
ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 12  
CMD649: 100% native mode on irq 12  
  
dmesg linux_2.6.7_boot_hang:  
ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 6 7 10 11 12) *5  
ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 10 
ACPI: PCI interrupt 0000:00:09.0[A] -> GSI 10 (level, low) -> IRQ 10 
CMD649: 100% native mode on irq 10 
 
So in the old days, /proc/interrupts would probably show this 
device happy on IRQ5.  For a short time we moved it to IRQ12 
(the subject of this bug report), but apparently there is no PS2 
mouse on this system so that bug wasn't noticed.  That bug 
was fixed and moved CMD to IRQ10, where it dies. 
 
While the patch in this bug report changes the behaviour 
of this system, I don't think it is the root cause of this 
failure.  Perhaps you can apply just the patch above 
to a vanilla 2.6.6 kernel and see if it moves the CMD 
to IRQ10, and if it works in that kernel. 
 
If yes, then we broke this system later in 2.6.7, probably 
with the fix for bug #2574 vs PIC mode PCI. 
Comment 16 Anssi Saari 2004-06-28 10:59:19 UTC
Re: comment 14: Indeed, with acpi=off I didn't get the problem. Sorry. The CMD
goes to IRQ 5 then, as you said.

Re: comment 15: I tried 2.6.6 vanilla with just the patch in this bug report and
it also puts the CMD on IRQ 10. As does 2.6.6-bk1 with the same patch and
2.6.6-bk2 and later have it included and also have the same problem. I captured
the boot messages again from 2.6.6 and will attach it also. 

2.6.6 with the patch from this bug report does boot, but /proc/interrupts
doesn't show the CMD anywhere and actually trying to access a disk attached to
it causes a hang.
Comment 17 Anssi Saari 2004-06-28 11:00:38 UTC
Created attachment 3270 [details]
dmesg with 2.6.6 vanilla and patch from this bug report
Comment 18 Len Brown 2004-06-28 23:27:02 UTC
Thanks for testing 2.6.6+ patch above. 
This proves that the issue *is* IRQ10, and not the 2.6.7 mp_parse_prt() changes. 
 
One possibility is that IRQ10 is broken on this motherboard. 
Boot (latest unpatched kernel) with "acpi_irq_isa=10" and this should 
move the interrupt someplace else, probably IRQ11. 
However, I expect the MB is not broken and that we'll see 
the problem simply move to IRQ11. 
 
I notice that this device uses LNKA, which is also programmed to IRQ10: 
00:0a.0 SCSI storage controller: LSI Logic / Symbios Logic (formerly NCR) 53c810 (rev 02) 
 
Is this device being used?  Is it possible to disable it in the BIOS? 
One way to debug this is to disable all possible devices and see if 
the problem goes away, then see which device caused the problem. 
 
Please include the /proc/interrupts for the success and (if you can) the failure cases. 
 
Comment 19 Len Brown 2004-06-28 23:49:13 UTC
I should add, Anssi, if you have a WinXP boot disk, it would be interesting 
to see where Windows assigns the IRQs on this system. 
Comment 20 Anssi Saari 2004-06-29 14:00:31 UTC
Here are the IRQ assignments from Windows XP:

IRQ 0   System timer    OK
IRQ 1   Standard 101/102-Key or Microsoft Natural PS/2 Keyboard OK
IRQ 3   Communications Port (COM2)      OK
IRQ 4   Communications Port (COM1)      OK
IRQ 5           OK
IRQ 6   Standard floppy disk controller OK
IRQ 7   CMI8738/C3DX PCI Audio Device   OK
IRQ 7   CMD PCI-0649 Ultra DMA IDE Controller   OK
IRQ 8   System CMOS/real time clock     OK
IRQ 9   Microsoft ACPI-Compliant System OK
IRQ 10  MPU-401 Compatible MIDI Device  OK
IRQ 11  SAPPHIRE RADEON 9600 PRO ATLANTIS       OK
IRQ 11  LSI Logic 53C810 Device OK
IRQ 11  VIA Rev 5 or later USB Universal Host Controller        OK
IRQ 11  VIA Rev 5 or later USB Universal Host Controller        OK
IRQ 11  VIA Rev 5 or later USB Universal Host Controller        OK
IRQ 11  VIA Rev 5 or later USB Universal Host Controller        OK
IRQ 12  Broadcom NetXtreme Gigabit Ethernet     OK
IRQ 13  Numeric data processor  OK
IRQ 14  Primary IDE Channel     OK
IRQ 15  Secondary IDE Channel   OK

And that really tells the tale. The thing in IRQ 10 is part of the Winbond
Super-IO chip on the board and the midi port is by default enabled and set to
use IRQ 10. Giving the acpi_irq_isa=10 option worked fine, as did sabling the
midi device in bios setup. IRQs get assigned like this then:

            CPU0       
  0:     331726          XT-PIC  timer
  1:        935          XT-PIC  i8042
  2:          0          XT-PIC  cascade
  7:       1126          XT-PIC  parport0
  8:          1          XT-PIC  rtc
  9:          0          XT-PIC  acpi
 10:         31          XT-PIC  ide2, ehci_hcd
 11:       7670          XT-PIC  uhci_hcd, uhci_hcd, uhci_hcd, uhci_hcd, eth0,
sym53c8xx
 14:       7668          XT-PIC  ide0
 15:          1          XT-PIC  ide1
NMI:          0 
LOC:     331534 
ERR:        217
MIS:          0

Now, I suppose if the midi port is enabled and set to IRQ 10, then it's actually
quite right to give the acpi_irq_isa=10 parameter, isn't it? Or do the
corresponding thing in bios setup. I guess Windows XP works because it
understands a little more about ISA devices than Linux?

By the way, I thought this VIA chipset (KT600 with 8237 southbridge) has an
IO-APIC, but apparently not? Or maybe it's just disabled on this board?
Comment 21 Len Brown 2004-06-29 15:25:13 UTC
> Now, I suppose if the midi port is enabled and set to IRQ 10, then it's
> actually 
> quite right to give the acpi_irq_isa=10 parameter, isn't it? 
 
yes, it is a valid workaround -- though Linux should figure this 
out automatically... 
 
> Or do the corresponding thing in bios setup. 
 
Yes, disabling it in the BIOS is simpler.  Apparently there is no Linux 
driver bound to this device so you're not using it? 
 
> I guess Windows XP works because it 
> understands a little more about ISA devices than Linux? 
 
Yes, apparently Windows is parsing the DSDT and finding this motherboard 
device: 
 
Device (MIDI) { 
	Name (_HID, EisaId ("PNPB006")) 
 
And Linux is ignoring this information, and the IRQ that the device claims. 
So the problem on this board is actually the one reported in bug #2733 -- 
just that this bug fix exposed it.  So I'm re-closing this one. 
 
> By the way, I thought this VIA chipset (KT600 with 8237 southbridge) has an 
> IO-APIC, but apparently not? Or maybe it's just disabled on this board? 
 
The ACPI tables headers at the top of dmesg do not list an MADT, 
so Linux will not find one.  Dunno if this chip-set has one -- but 
you may find that there is a BIOS option to enable/disable it if 
there is one physically present.