Bug 4711

Summary: ACPI - System hang, loss of keyboard, Controller using wrong IRQ
Product: ACPI Reporter: Parag Warudkar (kernel-stuff)
Component: Config-InterruptsAssignee: Len Brown (lenb)
Status: CLOSED CODE_FIX    
Severity: normal CC: acpi-bugzilla, bunk
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.12-rc5 Subsystem:
Regression: --- Bisected commit-id:
Attachments: dmidecode output
biosdecode output
DMESG output
acpidmp output
Picture showing IRQ assignments
lspci -vv output
lspci -vv output with acpi=off (no hang case)
/proc/interrupts with acpi=off

Description Parag Warudkar 2005-06-04 13:52:16 UTC
Distribution: Gentoo (doesn't affect the bug occurence)   
Hardware Environment: Athlon64, nvidia chipset (Compaq R3240US Laptop)   
Software Environment:   
Problem Description:   
With acpi enabled, there are 3 problems with the laptop -  
 
1) The laptop hangs during boot for 2 minutes after which it recovers. Hang  
happens if a usb-storage device is attached to the laptop. I have traced the  
hang and it happens in pci_enable_device() for the controller to which the  
storage device is attached. (It is called by ohci_hcd module's probe()  
function)  
 
2) If the laptop is *restarted* (without shutting off) from Windows XP into 
Linux, the keyboard doesn't get detected. 
 
3) Sometimes without any setup change I get the following message in logs and 
no USB device works when this happens- 
 [ 
Comment 1 Parag Warudkar 2005-06-04 14:04:20 UTC
Created attachment 5124 [details]
dmidecode output
Comment 2 Parag Warudkar 2005-06-04 14:05:09 UTC
Created attachment 5125 [details]
biosdecode output
Comment 3 Parag Warudkar 2005-06-04 14:05:50 UTC
All the listed problems go away if acpi=off is passed on kernel command line. 
Comment 4 Shaohua 2005-06-05 22:20:57 UTC
>ACPI: PCI Interrupt 0000:00:02.0[P]: no GSI - using IRQ 11 
How could a PCI device's pin be pin 'P'?
Could you please post detail information, such as dmesg, acpidmp?
Comment 5 Parag Warudkar 2005-06-06 04:21:52 UTC
Created attachment 5129 [details]
DMESG output
Comment 6 Parag Warudkar 2005-06-06 04:23:46 UTC
Created attachment 5130 [details]
acpidmp output

acpidmp output
Comment 7 Sérgio M Basto 2005-06-06 07:50:37 UTC
is this one "via boards with IOAPIC" ?
if yes, see bug http://bugme.osdl.org/show_bug.cgi?id=3319.
Please attach "cat /proc/interrupts"
Comment 8 Parag Warudkar 2005-06-06 10:08:27 UTC
S
Comment 9 Shaohua 2005-06-06 19:24:28 UTC
>PCI Interrupt 0000:00:02.0[P]: no GSI - using IRQ 11
Looks the OHCI device isn't initialized by BIOS or maybe isn't power on. So 
sometimes it causes hang. And sometimes, its interrupt pin is 'P'.
What's the interrupt assignment in Win? Can you track down which step in 
pci_enable_device cause the hang, pci_set_power_state or 
pcibios_enable_device? And also please try boot option 'pci=routeirq'. Thanks!
Comment 10 Parag Warudkar 2005-06-06 20:14:55 UTC
See the attached IRQ-Usb.PNG for Win IRQ Assignment :) The device is powered on
- only then the hang occurs. I will track it down to what exactly causes the
hang inside pci_enable_device(). Thanks.
Comment 11 Parag Warudkar 2005-06-06 20:15:47 UTC
Created attachment 5134 [details]
Picture showing IRQ assignments
Comment 12 Parag Warudkar 2005-06-07 12:47:38 UTC
Ok. pci=routeirq did not make a difference - pci_enable_device() still hangs 
while booting. Here is the relevant portion of dmseg with pci=routeirq 
 
PCI: Using ACPI for IRQ routing 
** Routing PCI interrupts for all devices because "pci=routeirq" 
** was specified.  If this was required to make a driver work, 
** please email the output of "lspci" to bjorn.helgaas@hp.com 
** so I can fix the driver. 
ACPI: PCI Interrupt Link [LSMB] enabled at IRQ 22 
ACPI: PCI interrupt 0000:00:01.1[A] -> GSI 22 (level, low) -> IRQ 22 
ACPI: PCI Interrupt Link [LUS0] enabled at IRQ 21 
ACPI: PCI interrupt 0000:00:02.0[A] -> GSI 21 (level, low) -> IRQ 21 
ACPI: PCI Interrupt Link [LUS1] enabled at IRQ 20 
ACPI: PCI interrupt 0000:00:02.1[B] -> GSI 20 (level, low) -> IRQ 20 
ACPI: PCI Interrupt Link [LUS2] enabled at IRQ 22 
ACPI: PCI interrupt 0000:00:02.2[C] -> GSI 22 (level, low) -> IRQ 22 
ACPI: PCI Interrupt Link [LACI] enabled at IRQ 21 
ACPI: PCI interrupt 0000:00:06.0[A] -> GSI 21 (level, low) -> IRQ 21 
ACPI: PCI Interrupt Link [LMCI] enabled at IRQ 20 
ACPI: PCI interrupt 0000:00:06.1[B] -> GSI 20 (level, low) -> IRQ 20 
ACPI: PCI Interrupt Link [LNK1] enabled at IRQ 19 
ACPI: PCI interrupt 0000:02:00.0[A] -> GSI 19 (level, low) -> IRQ 19 
ACPI: PCI Interrupt Link [LNK2] enabled at IRQ 18 
ACPI: PCI interrupt 0000:02:01.0[A] -> GSI 18 (level, low) -> IRQ 18 
ACPI: PCI Interrupt Link [LNK3] enabled at IRQ 17 
ACPI: PCI interrupt 0000:02:02.0[A] -> GSI 17 (level, low) -> IRQ 17 
ACPI: PCI interrupt 0000:02:04.0[A] -> GSI 19 (level, low) -> IRQ 19 
ACPI: PCI interrupt 0000:02:04.1[B] -> GSI 18 (level, low) -> IRQ 18 
ACPI: PCI Interrupt Link [LNK5] enabled at IRQ 16 
ACPI: PCI interrupt 0000:01:00.0[A] -> GSI 16 (level, low) -> IRQ 16 
Comment 13 Parag Warudkar 2005-06-07 12:50:45 UTC
cat /proc/interrupts 
-------------------- 
          CPU0 
  0:     516363    IO-APIC-edge  timer 
  1:       1071    IO-APIC-edge  i8042 
  8:          0    IO-APIC-edge  rtc 
  9:         16   IO-APIC-level  acpi 
 12:       3261    IO-APIC-edge  i8042 
 14:      10031    IO-APIC-edge  ide0 
 15:      30822    IO-APIC-edge  ide1 
 18:        379   IO-APIC-level  eth0 
 19:          3   IO-APIC-level  ohci1394 
 20:       1439   IO-APIC-level  ohci_hcd, NVidia nForce3 Modem 
 21:          0   IO-APIC-level  ohci_hcd, NVidia nForce3 
 22:      12884   IO-APIC-level  ehci_hcd 
NMI:         60 
LOC:     516210 
ERR:          0 
MIS:          0 
 
Comment 14 Parag Warudkar 2005-06-08 05:40:28 UTC
I tried to isolate the hang - it happens inside pcibios_enable_device(). 
Comment 15 Parag Warudkar 2005-06-25 22:11:19 UTC
Hmm.. Interestingly I booted the machine today under the same conditions when   
it hangs on boot and it did not hang while booting. More interestingly,   
the /proc/interrupts output is different than the case where it hangs -   
Normally (when it hangs on boot) IRQ assignment looks like this -  
  
18:        379   IO-APIC-level  eth0   
 19:          3   IO-APIC-level  ohci1394   
 20:       1439   IO-APIC-level  ohci_hcd, NVidia nForce3 Modem   
 21:          0   IO-APIC-level  ohci_hcd, NVidia nForce3   
 22:      12884   IO-APIC-level  ehci_hcd   
  
Today when it did not hang and everything still worked fine here is what the  
relevant differences are -  
  
 16:          3   IO-APIC-level  ohci1394   
 18:      49277   IO-APIC-level  nvidia   
 19:       1753   IO-APIC-level  eth0   
 20:       6253   IO-APIC-level  ohci_hcd   
 21:        646   IO-APIC-level  ohci_hcd, NVidia nForce3   
 22:       2225   IO-APIC-level  ehci_hcd, NVidia nForce3 Modem   
NMI:         80   
LOC:     840535   
ERR:          0   
MIS:          0   
 
Questions - Why would the IRQ Assignment change between boots and why did it 
not hang with the later assignment? What can be done to keep this assignment 
and will it avoid the hang if kept same? 
Comment 16 Shaohua 2005-06-26 18:41:30 UTC
Maybe you can find some rules, :). is there any difference in the dmesg in the 
non-hang case?
Comment 17 Shaohua 2005-07-03 20:09:23 UTC
This patch
http://marc.theaimsgroup.com/?l=linux-kernel&m=112018510921263&w=2
might help you, could you please try? Thanks!
Comment 18 Parag Warudkar 2005-07-10 07:24:41 UTC
I tried an updated version of the same patch you mentioned. It doesn't help. 
Comment 19 Parag Warudkar 2005-07-11 19:30:21 UTC
Hello! Is there a way to reliably force the IRQ assignment to be always -  
  
 20:       6253   IO-APIC-level  ohci_hcd     
 21:        646   IO-APIC-level ohci_hcd, NVidia nForce3     
 22:       2225   IO-APIC-level ehci_hcd, NVidia nForce3 Modem     
  
I am thinking the controller doesn't like the normal IRQ assignment which ACPI 
chooses. If I had to modify the code to assign IRQs as above, where should I 
start looking to begin with? 
Comment 20 Parag Warudkar 2005-07-12 20:32:39 UTC
Windows IRQ Assignment - Strangely enough, IRQ 19 is missing!

IRQ 0	System timer	OK
IRQ 1	Standard 101/102-Key or Microsoft Natural PS/2 Keyboard	OK
IRQ 6	Standard floppy disk controller	OK
IRQ 8	System CMOS/real time clock	OK
IRQ 9	Microsoft ACPI-Compliant System	OK
IRQ 10	NVIDIA nForce PCI System Management	OK
IRQ 12	Microsoft PS/2 Port Mouse (IntelliPoint)	OK
IRQ 13	Numeric data processor	OK
IRQ 14	NVIDIA nForce3 Parallel ATA Controller (v2.6)	OK
IRQ 15	NVIDIA nForce3 Parallel ATA Controller (v2.6)	OK
IRQ 16	Texas Instruments PCI-1620 CardBus Controller with UltraMedia.	OK
IRQ 16	NVIDIA GeForce4 440 Go 64M	OK
IRQ 17	Broadcom 802.11b/g WLAN	OK
IRQ 18	Texas Instruments OHCI Compliant IEEE 1394 Host Controller	OK
IRQ 18	Texas Instruments PCI-1620 CardBus Controller with UltraMedia.	OK
IRQ 20	Standard OpenHCD USB Host Controller	OK
IRQ 20	SoundMAX Integrated Digital Audio	OK
IRQ 21	Standard OpenHCD USB Host Controller	OK
IRQ 21	Agere Systems AC'97 Modem	OK
IRQ 22	Standard Enhanced PCI to USB Host Controller	OK
Comment 21 Shaohua 2005-07-12 22:05:02 UTC
Can you attach the 'lspci -vv' output from the broken case? I'm still confused 
why the pin is 'P'.
Comment 22 Parag Warudkar 2005-07-13 11:02:00 UTC
The "PCI Interrupt 0000:00:02.0[P]: no GSI" problem is very rare - I haven't 
recently encountered that problem. Routinely, the pin is almost always A.

My real problem is the 2 minute hang while device is attached and the USB 
controller is assigned IRQ other than 20.
I also tried (just for the heck of it) doing dev->irq = 20 before hcd-pci.c 
calls pci_enable_device - it doens't hang in that case. So this almost proves 
that the controller doesn't like any IRQ other than 20 to be assigned to it. 
Now only if we could look at why Linux's IRQ assignment is different than 
Windows and why ACPI chooses different IRQs at different times, I think we can 
crack this.

 I will post the lspci -vv output regardless though. Thanks.
Comment 23 Parag Warudkar 2005-07-13 15:51:39 UTC
Created attachment 5324 [details]
lspci -vv output
Comment 24 Parag Warudkar 2005-07-13 15:55:07 UTC
Created attachment 5325 [details]
lspci -vv output with acpi=off (no hang case)
Comment 25 Parag Warudkar 2005-07-13 15:59:11 UTC
Created attachment 5326 [details]
/proc/interrupts with acpi=off
Comment 26 Parag Warudkar 2005-07-13 16:46:41 UTC
Some more info -   
  
The laptop has 3 USB ports - 1 legacy USB1.1 and 2 USB 2.0 ports. If any USB  
device is connected to any one or both of the two USB 2.0 ports, only then the  
load of ohci-hcd hangs the system during bootup. Attaching device to the legacy  
port doesnt hang the system by itself when ohci-hcd is loaded. 
  
This is the weirdest problem I ever had to debug! :(  
Comment 27 Parag Warudkar 2005-08-13 15:05:31 UTC
I managed to get a call trace while the kernel hangs -

ohci_hcd: 2004 Nov 08 USB 1.1 'Open' Host Controller (OHCI) Driver (PCI)
PCI: Enabling device 0000:00:02.0 (0004 -> 0006)
BUG: soft lockup detected on CPU#0!

Modules linked in: ohci_hcd i2c_nforce2 i2c_core snd_intel8x0m shpchp snd_intel8
x0 snd_ac97_codec snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_d
evice snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc 8
139too mii floppy dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod sata_nv libata s
d_mod usb_storage scsi_mod ehci_hcd
Pid: 1115, comm: modprobe Not tainted 2.6.11-1.1369_FC4
RIP: 0010:[<ffffffff801703da>] <ffffffff801703da>{handle_IRQ_event+26}
RSP: 0018:ffffffff8050a980  EFLAGS: 00000246
RAX: 0000000000000001 RBX: ffff81005faa22c0 RCX: 000000000000003d
RDX: ffff81005faa22c0 RSI: ffffffff8050aa08 RDI: 0000000000000001
RBP: 0000000000000000 R08: 0000000000000000 R09: ffff81005fbb3d9c
R10: 0000000000000046 R11: ffffffff8033f080 R12: 0000000000000001
R13: ffffffff8050aa08 R14: 0000000000000001 R15: 00000000005160f8
FS:  00002aaaaadfd6e0(0000) GS:ffffffff80576c00(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002aaaaad698f0 CR3: 000000005ed3c000 CR4: 00000000000006e0

Call Trace: <IRQ> <ffffffff801705fd>{__do_IRQ+477} <ffffffff880049ff>{:ehci_hcd:
ehci_work+47}
       <ffffffff801120b8>{do_IRQ+72} <ffffffff8010f6c3>{ret_from_intr+0}
       <ffffffff8033f080>{pci_conf1_read+0} <ffffffff801703da>{handle_IRQ_event+
26}
       <ffffffff801705fd>{__do_IRQ+477} <ffffffff8013c54e>{profile_tick+78}
       <ffffffff801120b8>{do_IRQ+72} <ffffffff8010f6c3>{ret_from_intr+0}
        <EOI> <ffffffff8033f080>{pci_conf1_read+0} <ffffffff8024bb91>{pci_bus_re
ad_config_byte+353}
       <ffffffff802851ce>{acpi_pci_irq_enable+79} <ffffffff8033ec0c>{pcibios_ena
ble_resources+236}
       <ffffffff8024ebb6>{pci_enable_device_bars+38} <ffffffff8024ebeb>{pci_enab
le_device+27}
       <ffffffff8031b3c3>{usb_hcd_pci_probe+83} <ffffffff80250a67>{pci_device_pr
obe+103}
       <ffffffff802507aa>{pci_bus_match+42} <ffffffff802c503d>{driver_probe_devi
ce+77}
       <ffffffff802c50b6>{driver_attach+70} <ffffffff802c51a9>{bus_add_driver+16
9}
       <ffffffff80250430>{pci_register_driver+160} <ffffffff80165765>{sys_init_m
odule+277}
       <ffffffff8010f0fe>{system_call+126}
Call Trace: <IRQ> <ffffffff8017012d>{softlockup_tick+285}
<ffffffff80113fc1>{timer_interrupt+913}
       <ffffffff801703ec>{handle_IRQ_event+44} <ffffffff801705fd>{__do_IRQ+477}
       <ffffffff801120b8>{do_IRQ+72} <ffffffff8010f6c3>{ret_from_intr+0}
       <ffffffff8033f080>{pci_conf1_read+0} <ffffffff801703da>{handle_IRQ_event+26}
       <ffffffff801705fd>{__do_IRQ+477} <ffffffff880049ff>{:ehci_hcd:ehci_work+47}
       <ffffffff801120b8>{do_IRQ+72} <ffffffff8010f6c3>{ret_from_intr+0}
       <ffffffff8033f080>{pci_conf1_read+0} <ffffffff801703da>{handle_IRQ_event+26}
       <ffffffff801705fd>{__do_IRQ+477} <ffffffff8013c54e>{profile_tick+78}
       <ffffffff801120b8>{do_IRQ+72} <ffffffff8010f6c3>{ret_from_intr+0}
        <EOI> <ffffffff8033f080>{pci_conf1_read+0}
<ffffffff8024bb91>{pci_bus_read_config_byte+353}
       <ffffffff802851ce>{acpi_pci_irq_enable+79}
<ffffffff8033ec0c>{pcibios_enable_resources+236}
       <ffffffff8024ebb6>{pci_enable_device_bars+38}
<ffffffff8024ebeb>{pci_enable_device+27}
       <ffffffff8031b3c3>{usb_hcd_pci_probe+83}
<ffffffff80250a67>{pci_device_probe+103}
       <ffffffff802507aa>{pci_bus_match+42}
<ffffffff802c503d>{driver_probe_device+77}
       <ffffffff802c50b6>{driver_attach+70} <ffffffff802c51a9>{bus_add_driver+169}
       <ffffffff80250430>{pci_register_driver+160}
<ffffffff80165765>{sys_init_module+277}
       <ffffffff8010f0fe>{system_call+126}
ACPI: PCI Interrupt Link [LUS0] enabled at IRQ 22
Comment 28 Parag Warudkar 2005-12-17 13:30:20 UTC
Whoa! These problems automatically stopped recently when I upgraded the laptop from 2.6.11 to 2.6.15-
rc5. Don't know what exactly fixed these issues and in which kernel release. Closing since it's no longer 
reproducible.