Bug 8853 - irq NN: nobody cared on ThinkPad X61T,T61
Summary: irq NN: nobody cared on ThinkPad X61T,T61
Status: REJECTED INVALID
Alias: None
Product: ACPI
Classification: Unclassified
Component: BIOS (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Len Brown
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-08-07 04:10 UTC by Jan Gutter
Modified: 2008-02-27 22:51 UTC (History)
28 users (show)

See Also:
Kernel Version: 2.6.23-rc2 and earlier
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
dmesg of 2.6.23-rc2 (24.73 KB, text/plain)
2007-08-07 04:14 UTC, Jan Gutter
Details
ACPI dump of TP X61T (287.96 KB, text/plain)
2007-09-10 03:21 UTC, Jan Gutter
Details
Output of acpidump running on a Thinkpad T61P (6460-6XG) (313.98 KB, text/plain)
2007-09-10 05:39 UTC, Klaus S. Madsen
Details
X61T: dmesg with pci=noacpi apic=debug (24.65 KB, text/plain)
2007-09-10 23:43 UTC, Jan Gutter
Details
X61T: /proc/interrupts with pci=noacpi apic=debug (774 bytes, text/plain)
2007-09-10 23:44 UTC, Jan Gutter
Details
X61T: lspci -v with pci=noacpi apic=debug (7.79 KB, text/plain)
2007-09-10 23:45 UTC, Jan Gutter
Details
X61T: /proc/interrupts with acpi=noirq (774 bytes, text/plain)
2007-09-11 02:49 UTC, Jan Gutter
Details
X61T: Interrupt assignment in Vista Business (32-bit) (1.27 KB, text/plain)
2007-09-11 02:52 UTC, Jan Gutter
Details
X61T: kernel config for the next 3 attachments (55.19 KB, text/plain)
2007-09-16 09:21 UTC, Jan Gutter
Details
X61T: dmesg with apic=debug initcall_debug debug (96.08 KB, text/plain)
2007-09-16 09:23 UTC, Jan Gutter
Details
X61T: lspci -v with apic=debug initcall_debug debug (7.82 KB, text/plain)
2007-09-16 09:23 UTC, Jan Gutter
Details
X61T: lspci -xxx with apic=debug initcall_debug debug (19.54 KB, text/plain)
2007-09-16 09:24 UTC, Jan Gutter
Details
X61T: dmesg with apic=debug initcall_debug debug (showing spurious interrupt) (97.30 KB, text/plain)
2007-09-17 09:25 UTC, Jan Gutter
Details
X61T: /proc/interrupts with apic=debug initcall_debug debug (1.09 KB, text/plain)
2007-09-17 09:28 UTC, Jan Gutter
Details
Get the info using ./test 0xfed1c000 0x4000 result (4.19 KB, application/x-gzip)
2007-09-24 21:33 UTC, ykzhao
Details
X61T: result of "./test 0xfed1c000 0x4000 result" (16.00 KB, application/octet-stream)
2007-09-25 00:45 UTC, Jan Gutter
Details
X61T: latest 2.6.23-rc8 config (55.04 KB, text/plain)
2007-09-27 12:14 UTC, Jan Gutter
Details
Add bogus IRQ counts to /proc/interrupts HACK (1.51 KB, patch)
2007-10-09 15:31 UTC, Benjamin Herrenschmidt
Details | Diff

Description Jan Gutter 2007-08-07 04:10:58 UTC
Most recent kernel where this bug did not occur: none

Distribution: Gentoo stable

Hardware Environment: Lenovo Thinkpad x61 Tablet
(model 7767-B8G)
Intel Core2 Duo, Santa Rosa chipset

Software Environment: Linux kernel, Gentoo stable userspace 

Problem Description:

Bootup + a couple of minutes, the following appears in dmesg:

irq 20: nobody cared (try booting with the "irqpoll" option)
 [<c0158264>] __report_bad_irq+0x24/0x90
 [<c0158541>] note_interrupt+0x271/0x2b0
 [<c0157745>] handle_IRQ_event+0x25/0x60
 [<c0158ced>] handle_fasteoi_irq+0xbd/0xf0
 [<c01071eb>] do_IRQ+0x9b/0xc0
 [<c0104d6f>] common_interrupt+0x23/0x28
 [<f892a263>] uhci_irq+0x23/0x170 [uhci_hcd]
 [<c041b94d>] _spin_unlock+0xd/0x30
 [<c0149e61>] tick_handle_oneshot_broadcast+0x131/0x140
 [<f8905262>] usb_hcd_irq+0x22/0x60 [usbcore]
 [<c0157745>] handle_IRQ_event+0x25/0x60
 [<c0158ca9>] handle_fasteoi_irq+0x79/0xf0
 [<c0158c30>] handle_fasteoi_irq+0x0/0xf0
 [<c01071c3>] do_IRQ+0x73/0xc0
 [<c0104d6f>] common_interrupt+0x23/0x28
 [<c02c25fa>] acpi_processor_idle+0x22e/0x3f0
 [<c02c23cc>] acpi_processor_idle+0x0/0x3f0
 [<c02c23cc>] acpi_processor_idle+0x0/0x3f0
 [<c0102444>] cpu_idle+0x74/0xd0
 [<c05a1a2a>] start_kernel+0x2da/0x360
 [<c05a1140>] unknown_bootoption+0x0/0x1f0
 =======================
handlers:
[<f8905240>] (usb_hcd_irq+0x0/0x60 [usbcore])
Disabling IRQ #20

After this the USB ports on the right side of the notebook is dead. (As expected)

Running with irqpoll caused my system to lock up hard once, haven't tried it lately. I assume it's buggy firmware from Lenovo: I'd be happy to test any patches you can throw at me. Attaching full dmesg in next post.
Comment 1 Jan Gutter 2007-08-07 04:14:43 UTC
Created attachment 12294 [details]
dmesg of 2.6.23-rc2

The 'set_level status: 0' messages are acpi-video's vain attempts to set the brightness. It's independent of the spurious IRQ.
Comment 2 Volker Braun 2007-08-08 06:57:16 UTC
I get the same error on my T61 6564-CTO (15.4" widescreen), but now on irq 19. Afterwards, the two horizontal USB ports on the right side are dead. The one vertical USB port on the left side still works.

I can confirm that kernels 2.6.21 and 2.6.22 are affected.

Presumeably something is wrong with the USB irq or something caused a non-usb irq on irq 19. 

========================================================
Jul  3 23:51:54 thinkpad kernel: irq 19: nobody cared (try booting with the "irqpoll" option)
Jul  3 23:51:54 thinkpad kernel: 
Jul  3 23:51:54 thinkpad kernel: Call Trace:
Jul  3 23:51:54 thinkpad kernel:  <IRQ>  [<ffffffff802ba4ee>] __report_bad_irq+0x30/0x72
Jul  3 23:51:54 thinkpad kernel:  [<ffffffff802ba6fd>] note_interrupt+0x1cd/0x20e
Jul  3 23:51:54 thinkpad kernel:  [<ffffffff802bafce>] handle_fasteoi_irq+0xa9/0xd1
Jul  3 23:51:54 thinkpad kernel:  [<ffffffff802654a3>] do_IRQ+0xf1/0x15f
Jul  3 23:51:54 thinkpad kernel:  [<ffffffff80257631>] ret_from_intr+0x0/0xa
Jul  3 23:51:54 thinkpad kernel:  [<ffffffff88004d6b>] :uhci_hcd:uhci_irq+0x20/0x153
Jul  3 23:51:54 thinkpad kernel:  [<ffffffff88019bec>] :ehci_hcd:ehci_irq+0x27/0x182
Jul  3 23:51:54 thinkpad kernel:  [<ffffffff803bf04b>] usb_hcd_irq+0x24/0x52
Jul  3 23:51:54 thinkpad kernel:  [<ffffffff8020f931>] handle_IRQ_event+0x25/0x53
Jul  3 23:51:54 thinkpad kernel:  [<ffffffff802bafb9>] handle_fasteoi_irq+0x94/0xd1
Jul  3 23:51:54 thinkpad kernel:  [<ffffffff802654a3>] do_IRQ+0xf1/0x15f
Jul  3 23:51:54 thinkpad kernel:  [<ffffffff80257631>] ret_from_intr+0x0/0xa
Jul  3 23:51:54 thinkpad kernel:  <EOI>  [<ffffffff8039900d>] acpi_processor_idle+0x2a0/0x4a1
Jul  3 23:51:54 thinkpad kernel:  [<ffffffff80399003>] acpi_processor_idle+0x296/0x4a1
Jul  3 23:51:54 thinkpad kernel:  [<ffffffff80398d6d>] acpi_processor_idle+0x0/0x4a1
Jul  3 23:51:54 thinkpad kernel:  [<ffffffff80244248>] cpu_idle+0x8c/0xaf
Jul  3 23:51:54 thinkpad kernel: 
Jul  3 23:51:54 thinkpad kernel: handlers:
Jul  3 23:51:54 thinkpad kernel: [<ffffffff803bf027>] (usb_hcd_irq+0x0/0x52)
Jul  3 23:51:54 thinkpad kernel: Disabling IRQ #19
========================================================

Something generates a lot of interrupts on irq 19, even if nothing is plugged in:

[root@thinkpad ~]# cat /proc/interrupts 
           CPU0       CPU1       
  0:     554859          0   IO-APIC-edge      timer
  1:       5536          0   IO-APIC-edge      i8042
  8:          1          0   IO-APIC-edge      rtc
  9:        200       1688   IO-APIC-fasteoi   acpi
 12:      61800          0   IO-APIC-edge      i8042
 14:       6789       8642   IO-APIC-edge      libata
 15:        122         21   IO-APIC-edge      libata
 16:          0          0   IO-APIC-fasteoi   yenta, uhci_hcd:usb3
 17:      24303          0   IO-APIC-fasteoi   uhci_hcd:usb4, HDA Intel, iwl4965
 18:          1        424   IO-APIC-fasteoi   uhci_hcd:usb5, sdhci:slot0
 19:       1983      98018   IO-APIC-fasteoi   ehci_hcd:usb7
 20:     223291          0   IO-APIC-fasteoi   uhci_hcd:usb1
 21:          0          0   IO-APIC-fasteoi   uhci_hcd:usb2
 22:          3          0   IO-APIC-fasteoi   ehci_hcd:usb6
2298:       1487          0   PCI-MSI-edge      eth0
NMI:          0          0 
LOC:     554802     554779 
ERR:          0

The full lspci and lsusb output can be found here:
http://carrot.hep.upenn.edu/wiki/doku.php?id=thinkpad:system_information
Comment 3 Volker Braun 2007-08-08 07:14:12 UTC
Two more comments: I am running Fedora 7 x86_64, the OP gentoo i386 (correct me if I am wrong). So it is not a 32/64 bit issue.

Also, there was a similar discussion concerning a T61 on the linux-usb-users list
http://www.mail-archive.com/linux-usb-users@lists.sourceforge.net/msg18856.html
No definite solution was identified.
Comment 4 Jan Gutter 2007-08-08 13:01:30 UTC
As Volker said: this is clearly not restricted to my thinkpad model ;-)

Also (might be related, might not), my power button doesn't seem to generate ACPI events: /proc/acpi/button/power/PWRF/info reads "type: Power Button (FF)" and the lid and AC seems to generate events, but natch on PWRF.
Comment 5 Jan Gutter 2007-08-08 13:44:09 UTC
Something else that's VERY interesting:

cat /proc/interrupts

           CPU0       CPU1       
  0:     720651        763   IO-APIC-edge      timer
  1:        461          4   IO-APIC-edge      i8042
  5:         17          1   IO-APIC-edge      serial
  8:         38          1   IO-APIC-edge      rtc
  9:       6218         40   IO-APIC-fasteoi   acpi
 12:       9299         69   IO-APIC-edge      i8042
 14:          0          0   IO-APIC-edge      libata
 15:          0          0   IO-APIC-edge      libata
 16:      93926          0   IO-APIC-fasteoi   uhci_hcd:usb3
 17:          0          0   IO-APIC-fasteoi   uhci_hcd:usb4
 18:          1          1   IO-APIC-fasteoi   uhci_hcd:usb5, yenta, \ i915@pci:0000:00:02.0
 19:          2          1   IO-APIC-fasteoi   ehci_hcd:usb1
 20:     200000          1   IO-APIC-fasteoi   ehci_hcd:usb2
 21:      19210          1   IO-APIC-fasteoi   uhci_hcd:usb6, ohci1394, \
HDA Intel, ipw3945
 23:          0          0   IO-APIC-fasteoi   sdhci:slot0
220:        627       1489   PCI-MSI-edge      eth0
221:      26335          7   PCI-MSI-edge      ahci
NMI:          0          0 
LOC:      14171     263680 
ERR:          0
MIS:          0


Notice the nice round 200000 at IRQ20?

And yes, there seems to be a lot of interrupts on the USB bus with no physical activity.
Comment 6 Jan Gutter 2007-08-15 02:43:43 UTC
Bump just to verify that 2.6.23-rc3 (latest git as of today) is still affected by this.
Comment 7 Klaus S. Madsen 2007-09-03 23:50:26 UTC
I see the same problem on a Thinkpad T61P (Also with Santa Rosa chipset). Verified that the bug also is present with rc4. Will test with rc5 tonight.
Comment 8 Klaus S. Madsen 2007-09-04 11:19:07 UTC
The problem persists with rc5.
Comment 9 ykzhao 2007-09-10 02:38:07 UTC
Will you please upload the acpidump with boot option acpi=off? 
The acpidump tool can be found in the http://www.kernel.org/pub/linux/kernel/people/lenb/acpi/utils/pmtool_20070714_debug.
Thanks.
Comment 10 Jan Gutter 2007-09-10 03:21:30 UTC
Created attachment 12770 [details]
ACPI dump of TP X61T

This is the Thinkpad X61T (model 7767-B8G)'s acpidump. It's the same whether acpi=off is passed or not. And believe you me, the laptop's basically unusable with acpi=off!

Lastly, the power button seems to be back in action. You actually need to press it for a second or so, in stead of just "clicking" it. Dunno if it came right because of a newer kernel, or if I just didn't press it long enough in the past...
Comment 11 Klaus S. Madsen 2007-09-10 05:39:38 UTC
Created attachment 12772 [details]
Output of acpidump running on a Thinkpad T61P (6460-6XG)

Here is the output from acpidump running on a T61P, which also shows the problem.
Comment 12 ykzhao 2007-09-10 22:39:57 UTC
Thanks for the acpidump info.
Will you please try it with boot option of pci=noacpi apic=debug? 
If the system boots successfully , please attach the following info:
a. dmesg
b. lspci -v
c. /proc/interrups
Thanks.
Comment 13 Jan Gutter 2007-09-10 23:43:56 UTC
Created attachment 12784 [details]
X61T: dmesg with pci=noacpi apic=debug
Comment 14 Jan Gutter 2007-09-10 23:44:51 UTC
Created attachment 12785 [details]
X61T: /proc/interrupts with pci=noacpi apic=debug
Comment 15 Jan Gutter 2007-09-10 23:45:28 UTC
Created attachment 12786 [details]
X61T: lspci -v with pci=noacpi apic=debug
Comment 16 Jan Gutter 2007-09-10 23:49:53 UTC
Just some other random info: with pci=noacpi it seems everything is set to irq 10: this reflects the settings in the BIOS. I should also mention that IRQ 5 is "reserved" for the serial port for the wacom tablet.
Comment 17 Shaohua 2007-09-11 02:03:26 UTC
please try acpi=noirq, and give us the /proc/interrupts

Also, if you have a winxp installed in the system, please let us know the device interrupt assignment in win. This can help us narrow down the issue.
Comment 18 Jan Gutter 2007-09-11 02:49:55 UTC
Created attachment 12787 [details]
X61T: /proc/interrupts with acpi=noirq

Very little difference between pci=noacpi apic=debug and acpi=noirq
Comment 19 Jan Gutter 2007-09-11 02:52:18 UTC
Created attachment 12788 [details]
X61T: Interrupt assignment in Vista Business (32-bit)

I manually copied this out from Device Manager: human error might have slipped in, so if anything is extremely weird, just ask me to verify. Anyone know a good tool to use in Vista to get this info without printscreen and OCR? ;-)
Comment 20 Klaus S. Madsen 2007-09-11 07:38:14 UTC
Do you want me to collect the same data for my T61p?
Comment 21 Jan Gutter 2007-09-12 07:27:44 UTC
Ok, I've verified that I copied out the Vista information correctly. For future reference: the tool to use is msinfo32 (ships with Windows). I really hate booting into Vista, but this is for a good cause ;-) msinfo32 doesn't show the type (PCI/ISA) of the interrupts. If you require more detailed info and have a program to obtain it, I'd be happy to run it!

msinfo32 lists the negatively numbered IRQ's as unsigned numbers:
-5 == 4294967291,
-2 == 4294967294
Comment 22 ykzhao 2007-09-15 23:46:36 UTC
Hi, Jan Gutter
Thanks for the info.
Will you please check whether the debug function of PCI is enabled in the kernel configuration? If disabled, please enabled.
Please upload the following info with the boot option of apic=debug initcall_debug debug.
  a. dmesg
  b. lspci -v 
  c. lspic -xxx
Thanks.
Comment 23 Jan Gutter 2007-09-16 09:21:38 UTC
Created attachment 12841 [details]
X61T: kernel config for the next 3 attachments

This is the kernel config under which the next three attachments were made: Yes, I did forget PCI_DEBUG and ACPI_DEBUG on the previous ones!
Comment 24 Jan Gutter 2007-09-16 09:23:02 UTC
Created attachment 12842 [details]
X61T: dmesg with apic=debug initcall_debug debug
Comment 25 Jan Gutter 2007-09-16 09:23:36 UTC
Created attachment 12843 [details]
X61T: lspci -v with apic=debug initcall_debug debug
Comment 26 Jan Gutter 2007-09-16 09:24:18 UTC
Created attachment 12844 [details]
X61T: lspci -xxx with apic=debug initcall_debug debug
Comment 27 ykzhao 2007-09-17 08:08:52 UTC
Hi, Jan Gutter
From the comment #24 it seems that the error disappears.
Will you please check whether the error still exists when the version of 2.6.23-rc6 is used? 
If it can't work well, please attach the dmesg that contains the error info.
If it can work well , please try it with another version and attach the following info(dmesg, /proc/interrupts, lspci -v).
Thanks.
Comment 28 Jan Gutter 2007-09-17 08:17:10 UTC
The error still appears after a couple of minutes, and the dmesg was taken before that happened. I'll post the dmesg with the error in the next message. All the tests are done with the latest -git sources.
Comment 29 Jan Gutter 2007-09-17 09:25:58 UTC
Created attachment 12848 [details]
X61T: dmesg with apic=debug initcall_debug debug (showing spurious interrupt)

This is the dmesg showing the spurious interrupt
Comment 30 Jan Gutter 2007-09-17 09:28:00 UTC
Created attachment 12849 [details]
X61T: /proc/interrupts with apic=debug initcall_debug debug

IRQ 20 has got a rate of about ~211 interrupts/sec before it gets killed.
Comment 31 ykzhao 2007-09-21 00:36:07 UTC
Hi, Jan Gutter
We have a T61P(15.4 widescreen) by hand. We test the kernel of 2.6.23-rc2,rc3,rc5 and rc6.But we can't reproduce the error on our system. 
Comment 32 Klaus S. Madsen 2007-09-21 00:47:11 UTC
I have a T61P 15.4 widescreen (exact model: 6460-6XG), and this shows the problem, approximately 10 minutes after boot:

[  627.024000] irq 23: nobody cared (try booting with the "irqpoll" option)
[  627.024000]  [<c015b5d4>] __report_bad_irq+0x24/0x80
[  627.024000]  [<c015b892>] note_interrupt+0x262/0x2a0
[  627.024000]  [<f88b16c2>] usb_hcd_irq+0x22/0x60 [usbcore]
[  627.024000]  [<c015aaf0>] handle_IRQ_event+0x30/0x60
[  627.024000]  [<c015c27b>] handle_fasteoi_irq+0xbb/0xf0
[  627.024000]  [<c0106b1b>] do_IRQ+0x3b/0x70
[  627.024000]  [<c0105223>] common_interrupt+0x23/0x30
[  627.024000]  [<f8862977>] acpi_processor_idle+0x246/0x41f [processor]
[  627.024000]  [<f8862731>] acpi_processor_idle+0x0/0x41f [processor]
[  627.024000]  [<c0102413>] cpu_idle+0x53/0xe0
[  627.024000]  =======================
[  627.024000] handlers:
[  627.024000] [<f88b16a0>] (usb_hcd_irq+0x0/0x60 [usbcore])
[  627.024000] Disabling IRQ #23

I'll be happy to help fix this problem.
Comment 33 Klaus S. Madsen 2007-09-21 00:48:46 UTC
Oh, by the way, the above message is from the Ubuntu Gutsy kernel, but the same thing happens with 2.6.23-rc5 (which is the last one I tested).
Comment 34 Jan Gutter 2007-09-21 01:13:50 UTC
Might it be something in the BIOS settings or with specific hardware options? I believe there is a Lenovo tool to transfer the BIOS settings between similar model Thinkpads... I have read on a mailing list somewhere that a firmware update solved a similar problem once, that's why I assumed the answer's not necessarily directly linked to the kernel.
Comment 35 Klaus S. Madsen 2007-09-21 01:24:36 UTC
Might be. There is a newer version of the BIOS available for my laptop (it's currently running Version 7LET44WW (1.14-1.06) and the newest one is Version 7LET51WW (1.22-1.06)). I'll try to upgrade tomorrow, and see if the problem persists.

I'll be happy to try the bios settings migration tool also. However I don't remember changing any of the BIOS settings, so it should be as close to factory settings as possible.
Comment 36 ykzhao 2007-09-24 21:33:43 UTC
Created attachment 12922 [details]
Get the info using ./test 0xfed1c000 0x4000 result

Hi, Jan Gutter
Will you please get the info using the attached files? 
How to use this tool is described in the file of readme.
    ./test 0xfed1c000 0x4000 result.
Thanks.
Comment 37 Jan Gutter 2007-09-25 00:45:37 UTC
Created attachment 12923 [details]
X61T: result of "./test 0xfed1c000 0x4000 result"

The dmesg has these lines:
simple: module license 'unspecified' taints kernel.
addr=0xfed1c000,len=0x4000<7>0xfed1c000
0x4000
d4ca4000,f8d48000,0x4000
line=88,end over
Exit the module

Note: this was taken *BEFORE* the spurious IRQ. Do you need me to re-run after the IRQ occurred?
Comment 38 Jan Gutter 2007-09-25 11:21:53 UTC
I found something interesting, by pure accident today: if I use the rfkill switch (the one that disables both the bluetooth radio and the Wifi radio), the interrupts don't count up, AND I don't get the error!

If use the rfkill switch to disable the radios, the following disappears from lsusb:

Bus 003 Device 003: ID 0a5c:2110 Broadcom Corp.

Likewise lspci -xxx has a difference in the following section:

00:1c.1 PCI bridge: Intel Corporation PCI Express Port 2 (rev 03)

rfkill not used (radios enabled):
e0: 00 0f c7 83 06 07 08 00 33 00 00 00 00 00 00 00
rfkill used (radios disabled):
e0: 00 0f c7 03 06 07 08 00 33 00 00 00 00 00 00 00
Comment 39 Volker Braun 2007-09-25 12:24:47 UTC
For the record, the "ID 0a5c:2110 Broadcom Corp" part is the internal Bluetooth. The bt adapter is implemented as a USB device, basically it is a USB Bluetooth stick without the stick. It is supported via the hci_usb module.

Why did the bt driver not care about the irq?
Comment 40 Jan Gutter 2007-09-25 12:41:06 UTC
Yep, I know, BUT I didn't think about it because the bluetooth device is clearly bound to usb 3-2, IRQ #16, from cat /proc/interrupts and dmesg. I can see the interrupts counting up with it enabled (and transferring stuff makes it count up faster, I think). IRQ #20 is bound to ehci_hcd:usb2, which has nothing bound to it, except the right-side USB ports.

usb 3-1 is the fingerprint reader, FWIW (also sharing IRQ #16).
Comment 41 ykzhao 2007-09-25 23:33:58 UTC
Hi, Gan Jutter
Thanks for the info. It is unnecessary to re-run.
Comment 42 Benjamin Herrenschmidt 2007-09-26 00:22:39 UTC
I have a similar problem on a T61 7959-AB8, here it's irq 23 which is normally only assigned to one of the USB controllers. I haven't yet tried with the rfkill switch off. The problem occurs about 10mn after boot on a mostly idle machine.

I'll send all the logs & dumps etc... tonight or tomorrow unless the problem is found in the meantime :-)
Comment 43 Shaohua 2007-09-26 19:10:39 UTC
Did any USB guys look at the issue?
We checked the chipset config, and found the interrupt routing info is correct and Linux is  doing the right thing so far.
Comment 44 Christian Lachner 2007-09-26 23:14:38 UTC
I also have a T61 (7664-18G) and often experience those irq-switch-offs (#19 and #23). This only happens when the kill-swich is turned on which <speculation>is physically connecting the integrated bluetooth-dongle using usb</speculation> and creating an acpi-event to enable the wifi-card. I took out my iwl4965 and inserted an atheros 5418 and the problem still remains. IMHO it has something to do with the bluetooth-dongle or the fingerprintreader which are attached using usb. My T61 does not have the integrated camera.
Is there still a need for any logs or dumps?
Comment 45 Benjamin Herrenschmidt 2007-09-26 23:25:19 UTC
On mine, I had it happen on irq #21 today instead of the usual irq #23, which is weird. Both have USB uhci's on them and I think the Bluetooth HCI is not on any of those 2 (I'll double check). Irq #23 also has sdhci on it (though my distro kernel didn't attach a driver to it). I'm starting to wonder if something's wrong with those UHCI's... well, UHCI is pretty wrong by definition but maybe something is more wrong than usual here :-)
Comment 46 Jan Gutter 2007-09-27 02:44:41 UTC
Ok, a quick plot recap: (please correct me on anything, I'm just a clueless newbie!)

1. On certain model Thinkpads, we get an "IRQ XXX: nobody cared" message after a few minutes of uptime.
2. The IRQ *should be* connected to the USB driver handling the right-side USB ports, but the driver doesn't think it is, hence the ports are disabled.
3. If the rfkill switch is set (wireless disabled), the error does not seem to occur.
4. The bluetooth device unplugged by the switch, is connected to a different USB bus entirely (i.e. NOT the right-side ports).
5. We've had some smart firmware guys look over the code, and it doesn't look like the problem is with the ACPI routing of the IRQ's.

Ok, now my own inferences (which might be less accurate):

1. If an IRQ fires and none of the drivers connected to that IRQ handle it, "irq NN: nobody cared" occurs.
2. This could mean two things: ACPI routing is busted -> the kernel associates the wrong IRQ to the driver, or the driver is busted -> the driver ignores an IRQ that it should handle, or misconfigures the hardware.
3. If the ACPI routing is OK, this would mean the USB chipset driver is the one to blame?

Finally, some burning questions:

1. What's the next step? Bug the USB guys again?
2. Also, does the fact that I have a nice, round 200000 interrupts recorded on the IRQ signify anything?
Comment 47 Benjamin Herrenschmidt 2007-09-27 03:02:47 UTC
Another possible explanation could be some other device we don't have a driver for loaded at the moment (some legacy stuff, whatever) asserting the IRQ line, but the fact that it -changed- IRQ line for me today makes this less likely.

One thing I'm wondering if the IRQ is just a short interrupt or is actually asserted continuously. A way to do that would be to print on every occurence and not only after the count reaches 100 (and still disable it tho). If the prints are all together and then it gets disabled, then it's probably asserted by something. If not, then it's a "short" interrupt, and thus is harmless, and the kernel is being a bit too harsh at disabling it.

Basically, a short IRQ is an IRQ from a device that got "caught" by the APIC, but by the time it's actually serviced by the processor, it's gone.  There can be multiple reasons for that. It could be an APIC problem where some IRQs end up occasionally dispatched to multiple CPUs (I'm not too familiar with the x86 APICs so I don't know if that can be a problem), or it could be some HW issue where the IRQ output line from a chip, such as a UHCI controller, takes a bit too long to go down after it's been acked on the chip. In the later case, by the time it actually goes down, it may already have been unmasked by the APIC recorded as a new interrupts. That sort of thing...

At this stage, I suspect that the Intel folks are in the best spot to figure that out, though I can try tomorrow to figure out if it's a short interrupt problem or if there's actually a fully asserted interrupt happening (easy, just printk every time it's unhandled rather than after 100 iterations and look at the timestamps).
Comment 48 Matthew Garrett 2007-09-27 05:05:41 UTC
I have the same issue on an HP nc2510p (a 965-based system), so it's not just Thinkpads.
Comment 49 Jan Gutter 2007-09-27 05:24:59 UTC
Does the HP also have a bluetooth module and does it also disable some external USB ports?
Comment 50 Matthew Garrett 2007-09-27 05:57:23 UTC
It has bluetooth, but the interrupt disabled is the one for the firewire interface. It's not flagged as being shared with anything else.
Comment 51 Len Brown 2007-09-27 10:13:37 UTC
(Matthew, lets file the HP failure in a different bug report --
we have a little T61 community forming here and although
it would be great if the HP were the same, we're rarely so lucky:-)

I've got a T61 here on my desk which got IRQ 16 disabled
when running the Debian 2.6.22-1-amd64 kernel with usb5 and yenta on that IRQ;
but I've not reproduced the failure with any kernel.org kernels yet.
Comment 52 Len Brown 2007-09-27 10:26:01 UTC
whelp, added yenta to my 2.6.23-rc8 kernel and still no failure.

          CPU0       CPU1
  0:      32281      32211   IO-APIC-edge      timer
  1:          3          5   IO-APIC-edge      i8042
  8:          1          0   IO-APIC-edge      rtc
  9:        338        329   IO-APIC-fasteoi   acpi
 12:       1170       1162   IO-APIC-edge      i8042
 14:       1094       1120   IO-APIC-edge      ide0
 16:          0          0   IO-APIC-fasteoi   uhci_hcd:usb5, yenta
 17:          1          1   IO-APIC-fasteoi   ohci1394, uhci_hcd:usb6
 18:          0          0   IO-APIC-fasteoi   uhci_hcd:usb7
 19:         18         23   IO-APIC-fasteoi   ehci_hcd:usb2
 20:         17          9   IO-APIC-fasteoi   uhci_hcd:usb3
 21:          0          0   IO-APIC-fasteoi   uhci_hcd:usb4
 22:          0          2   IO-APIC-fasteoi   ehci_hcd:usb1
1273:        449        416   PCI-MSI-edge      eth1
1274:        860        887   PCI-MSI-edge      ahci
NMI:          0          0
LOC:      64446      64423
ERR:          0

$ lsusb
Bus 006 Device 001: ID 0000:0000
Bus 002 Device 001: ID 0000:0000
Bus 007 Device 001: ID 0000:0000
Bus 005 Device 001: ID 0000:0000
Bus 001 Device 001: ID 0000:0000
Bus 004 Device 001: ID 0000:0000
Bus 003 Device 002: ID 0483:2016 SGS Thomson Microelectronics Fingerprint Reader
Bus 003 Device 001: ID 0000:0000


BIOS Information
        Vendor: LENOVO
        Version: 7LET39WW (1.09 )
        Release Date: 05/14/2007

can somebody ship me a .config for 2.6.23-rc8 that fails?
Also, what method are you using to tweak the RF switch?
Comment 53 Jan Gutter 2007-09-27 12:14:05 UTC
Created attachment 12967 [details]
X61T: latest 2.6.23-rc8 config

This is similar to attachment 12841 [details], but I have made a few changes since (suggested by linuxpowertop.org), so I'm reposting. Main difference is CONFIG_IRQBALANCE and CONFIG_ACPI_DEBUG is not set, but that doesn't affect the error.

Also, the rfkill switch is a physical switch located just slightly to the left of the lid switch, on the bottom edge of the laptop. Slid to the left, radios are disabled (usb disconnect event for bluetooth, wifi disabled), slid to the right, radios are enabled (usb connect event on usb 3-2, wifi enabled).
Comment 54 Paul 2007-09-27 13:31:01 UTC
I had the exact same errors. The Thinkpad T61 permits disabling Bluetooth from the BIOS (under Security settings). Since I've disabled Bluetooth my laptop is running without errors for 2-3 hours now. I did also disabled some other things such as serial and parallel ports but I feel it is the Bluetooth that is causing the problem because:

- The problem only happens when the rfkill switch is on, never with it off
- The rfkill switch controls both wlan and bluetooth
- I've been running wlan all morning with Bluetooth disabled. No problems.

Hope this helps.
Comment 55 Benjamin Herrenschmidt 2007-09-27 18:29:22 UTC
Hrm... looks like the current kernel code is smart enough it differentiate short interrupts from really stale ones, so here goes for my explanations. I'll still add some instrumentation to the interrupt code to see if I can see something fishy but so far, it seems like a genuine unhandled interrupt.
Comment 56 Klaus S. Madsen 2007-09-27 23:38:22 UTC
I tried disabling only the Bluetooth from the BIOS, and I can confirm that the IRQ isn't disabled. The machine have been runnning for 45 minutes now, and the USB ports are still working. This is on a T61p (6460-6XG)
Comment 57 Paul 2007-09-28 12:51:54 UTC
I've been running for 2 days now with Bluetooth disabled on the Thinkpad T61 7664-16u, with wireless enabled. Not a single problem.
Comment 58 Len Brown 2007-10-04 12:21:11 UTC
Installation of Fedora Core 7 i386 2.6.21-1.3194.fc7
failed on my T61.

The installation could not find the SATA/AHCI drive.
dmesg showed an irq20 nobody cared -- the irq shared
by yenta, uhci_hcd:usb4, and libata.   Also, looking
at the dmesg and lspci output, the graphics device
is also connected to this pin, though it doesn't
seem to have a linux driver loaded.

(note that IRQ20 would be called IRQ16 on x86_64
 to match GSI 16, since only i386 has the bogus irq compression code)

ACPI: PCI Interrupt 0000:15:00.0[A] -> GSI 16 (level, low) -> IRQ 20
ACPI: PCI Interrupt 0000:00:1d.0[A] -> GSI 16 (level, low) -> IRQ 20
ACPI: PCI Interrupt 0000:00:1f.2[B] -> GSI 16 (level, low) -> IRQ 20
ACPI: PCI Interrupt 0000:00:02.0[A] -> GSI 16 (level, low) -> IRQ 20

15:00.0 CardBus bridge: Ricoh Co Ltd RL5c476 II (rev ba)
Interrupt: pin A routed to IRQ 20

00:1d.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #1 (rev 03) (prog-if 00 [UHCI])
Interrupt: pin A routed to IRQ 20

00:1f.2 IDE interface: Intel Corporation Mobile SATA IDE Controller (rev 03) (prog-if 80 [Master])
Interrupt: pin B routed to IRQ 20

00:02.0 VGA compatible controller: Intel Corporation Mobile Integrated Graphics Controller (rev 0c) (prog-if 00 [VGA])
Interrupt: pin A routed to IRQ 20

Entering BIOS SETUP and changing SATA mode to "Compatibility"
from "AHCI", I was able to install FC7 w/o problems --
and libata ends up on IRQ14 plus IRQ15.
Comment 59 Benjamin Herrenschmidt 2007-10-04 15:12:11 UTC
On mine, the IRQ that gets busted doesn't have ata on it:

 23:      50102      49899   IO-APIC-fasteoi   ehci_hcd:usb7

(I think the Ricoh sdhci thingy is also routed there tho, I don't have a driver loaded for it at the moment).
Comment 60 Jan Gutter 2007-10-05 04:07:58 UTC
Last week a new BIOS came out for the Thinkpad X61T, this is just to confirm that it hasn't fixed anything:
it was 7SET18WW (1.04), now it's 7SET20WW (1.06), baseboard firmware is still on 7RHT16WW-1.02, though.

Len, did you have bluetooth disabled (either via kill-switch, or the security menu in the BIOS) when you did your Fedora install? Is the bluetooth USB device really the common denominator here?
Comment 61 Benjamin Herrenschmidt 2007-10-07 20:38:56 UTC
Allright, so I did some digging and come up with more data though no solution at this point:

First, some data about my laptop, it's a T61, and the "stray" IRQ is #23 which is apparently only routed to one of the EHCI controllers (the one that is paired with the UHCIs that controls the leftmost ports and the BlueTooth dongle). I don't see anything else on that IRQ line, at least via /proc/interrupts, or whatever else I can find, but there might of course be something on the mobo...

I've hacked the kernel IRQ code (I'll attach a patch) to add a counter on each IRQ line of how many total bogus interrupts happened (interrupts that no handler accepted, that is, IRQ_NONE result). I've also hacked the threshold for disabling the IRQ so that if there's more than 1 jiffy between 2 occurences, it will consider the IRQ spurrious and not stray, and thus won't disable it.

My findings so far are that this IRQ line is subject to a flood of about 200 bogus IRQs per seconds whenever the BlueTooth dongle is active. They cause the kernel to switch the IRQ off after a while because the delay between two of them isn't long enough for the kernel to consider them as simple short interrupts.

It doesn't seem to be a stray interrupt, since this is a level IRQ line, that would result in one CPU being totally out until the IRQ gets disabled, which is not the case. We just get this continuous stream of 200 bogus IRQs per second, which don't get acked anywhere, so that's really strange. It looks like there's a 200Hz square wave connected to that IRQ line.

No, I've tried various things with USB & bluetooth, result is as follow:

 - First, the interrupt in question seem to only have the EHCI on it. However, BT doesn't use EHCI (it's not a high speed device), it uses UHCI. So EHCI should only be involved in the initial port connect sequence. I've verified this is the case, the EHCI driver then switches the controller to HALT state and whenever those stray IRQs happen, the EHCI status register always contain 0x1000 which means halted and no interrupt pending. So something "else" is toggling that IRQ line in the background.

 - If I rmmod uhci_hcd (the _U_HCI driver where BT is connected), the flood stops. If I modprobe it again, it restarts.

 - If I killswitch (HW switch), the flood stops, it restarts if I re-enable BT
 - If I kill via /proc/acpi/ibm/bluetooth, same effect (both cause the USB device to disconnect/reconnect).
 - If I go to the sysfs file of the BT HCI device and do echo suspend >power/state, the flood stops. If I do echo auto >power/state, the flood resumes.

So at this point, it looks like the IRQ line is "shorted" with a 200Hz output from the BT dongle, which is strange. Unfortunately, the BT dongle is some Broadcom part for which I can't download a data sheet or errata (and that wouldn't necessarily help anyway).
Comment 62 Benjamin Herrenschmidt 2007-10-09 15:31:15 UTC
Created attachment 13090 [details]
Add bogus IRQ counts to /proc/interrupts HACK

This hack adds counters of bogus interrupts to /proc/interrupts (in parenthesis after the list of attached devices). This helps show where the problem is. You can see that number starting to increase at about 200HZ on an IRQ line as soon as BT is enabled. What IRQ line is affected seems to depend on the machine model. The exact same problem has been reported on other 965gm based machines such as HP.
Comment 63 Len Brown 2007-10-09 15:58:22 UTC
reproduced the failure per Ben's description using FC8-T3.
I needed to
1. enable physical wireless kill switch (and see BT logo light up)

That caused the BT device to appear to lsusb, which it wasn't before:

Bus 003 Device 004: ID 0a5c:2110 Broadcom Corp.

2. Then enable the device:
# hciconfig hci0 up

And watched GSI 19 (IRQ 21 on this box)
[root@t61 ibm]# dmesg |grep 'IRQ 21'
ACPI: PCI Interrupt 0000:00:1d.7[D] -> GSI 19 (level, low) -> IRQ 21

increment at 200/sec

Confirmed that "hciconfig hci0" stops the issue,
as well as switching off the wireless radio switch.

It sure looks like some sort of BT driver needs to register on GSI 19
to handle the interrupts the hardware is sending there...
Comment 64 Benjamin Herrenschmidt 2007-10-09 16:50:50 UTC
I don't totally agree with this BT driver idea. Here's why:

 - First, this is a PCI IRQ line, thus it's level low. However, what we see is a burst of 200HZ, thus it's not a level interrupt, more like either some square wave or an edge interrupt, or a parasite copy of another interrupt.

 - BT is a USB device. It shouldn't have a direct IRQ line

I wonder if maybe what happens is that all IRQs for the paired UHCI where BT is connected end up duplicated/shunted to the IRQ line of the EHCI... that would cause something like that. Maybe a misrouting in the chipset ?
Comment 65 Klaus S. Madsen 2007-10-09 23:02:44 UTC
I don't know if this is helpful but:

On a mailinglist I saw someone with a T61, who had all his USB ports working until he got a driver for the wireless card installed. I.e. this guy had Bluetooth enabled and everything had been working for weeks, until he installed the driver for the wireless card, and suddenly all the USB ports in one side of his laptop stopped working.

I haven't had the time to verify this myself, but I'm thinking that both Bluetooth and Wireless needs to be enabled for the problem to occur? Maybe it's triggered by something the wireless driver does when initializing the card?
Comment 66 Jan Gutter 2007-10-09 23:40:02 UTC
Unfortunately I can't confirm this with my X61T. I've just disabled my wireless Intel 3945ABG (in the "security" tab in the BIOS), and confirmed that the PCI device is not even listed in lspci. The error still occurs on usb-2, BUT the IRQ has shifted to #19. It seems to confirm that the error follows the usb-2 line no matter how the ACPI enumerates the IRQ's. Looks like bluetooth still has a 100% correlation with the error.
Comment 67 Benjamin Herrenschmidt 2007-10-10 00:05:20 UTC
I haven't seen any relationship to wifi neither. The IRQ line seems to always be GSI 19 for a couple of machines around here. Is that the case for everyone else ?

the GSI number is the "real" number, the number displayed by linux in /proc/interrupts can change between boots, to see the GSI number, see the kernel log for the message that tells you the mapping.

For example, currently, the "bad" interrupt for me in linux is IRQ 22, and I can see this in dmesg:

ACPI: PCI Interrupt 0000:00:1d.7[D] -> GSI 19 (level, low) -> IRQ 22
Comment 68 Klaus S. Madsen 2007-10-10 00:13:05 UTC
Yes. I've checked a couple of my kern.log files, and its always GSI 19 that shows up.

I'll try to get the guy on the other (danish) mailing list to see if he can reproduce the problem without the wireless driver.
Comment 69 Benjamin Herrenschmidt 2007-10-10 17:44:40 UTC
I checked a couple of coworker T61 and X61's and it's also always GSI 19. I then booted Windows XP on mine, and checked the IRQ assignment in the Device Manager, and it shows only the EHCI on that interrupt, so there doesn't seem to be any other driver attached there... 

Maybe windows doesn't care about bogus IRQs as much as we do or uses a shorter time threshold to differentiate short IRQs from real stale ones.
Comment 70 Jan Gutter 2007-10-11 01:50:18 UTC
The USB ID of the Bluetooth chip refers to the Broadcom BCM2045B:
 http://www.broadcom.com/products/Bluetooth/Bluetooth-RF-Silicon-and-Software-Solutions/BCM2045

Isn't this chip also present in the X60/T60 series? I'm pretty sure that supports what Benjamin said: if the same Bluetooth chip doesn't give any errors on the X60/T60 series, something specific to the way the X61/T61 series implements it in hardware might be different.
Comment 71 Benjamin Herrenschmidt 2007-10-11 02:05:49 UTC
Either that, or it's a problem with the EHCI/UHCI combination ... some misconfiguration of the chipset that would cause the IRQs from that specific UHCI controller to be "mirrored" on the EHCI line...
Comment 72 Jens Bang 2007-10-17 03:38:29 UTC
(In reply to comment #68)
> I'll try to get the guy on the other (danish) mailing list to see if he can
> reproduce the problem without the wireless driver.

I think I'm the guy Klaus is talking about. I have been very busy for the last week, so I haven't had time to get into this until now. Sorry about that.

On my T61 the right-hans USB-ports stop working approx. 10 min after I turn BT and wireless on the master killswitc. But my USB-ports all worked fine until I installed ndiswrapper and the Windows driver (after 2 weeks of no USB-problems copying 75+ GB of data over the right-hand ports), so it has to have something to do with that.

What do you need from me in terms of testing this, and in terms of output before and after? I'm a bit of a Linux newbie, so I'll probably need quite detailed descriptions of what to do. If you don't want to spam this discussion with descriptions like that please e-mail me directly.

The first thing I need to find out is how to unload (or uninstall?) a Windows wireless driver loaded using ndiswrapper.
Comment 73 Benjamin Herrenschmidt 2007-10-17 03:47:28 UTC
I doubt it's related. I never used nor installed ndiswrapper or any such thing and I've tracked the thing down pretty low level. I suspect if you run my bogus IRQ counting patch, you'll see them flowing even without ndiswrapper.

At this stage, I suspect there is little any of us can do except wait for somebody for either Intel or Lenovo to dig into the HW and see what's going on. I don't have enough knowledge about those x86 chipsets to go down there myself.
Comment 74 Jan Gutter 2007-10-17 04:16:28 UTC
I think so too, someone's either going to need to get their hands dirty with a scope or some sort of firmware emulator (if the problem is HW), or check out the USB chip initialization (if the problem is SW). It would be very strange if the IRQ line got connected to some sort of 200HZ clock.

Are we reaching the limit of useful information that we're gathering here? Would it be productive to narrow the problem definitively down to the (Santa Rosa)/(BCM2045B and 3945ABG) combination?

Finally, the ugliest solution is a quirk in the driver. But if the problem is in the motherboard routing, that might be the only solution.
Comment 75 Paul 2007-10-17 09:41:40 UTC
Sorry guys but this is what I just found, which tends to indicate it has nothing to do with wifi nor bluetooth being on or off specifically (but my earlier observations hold: turning off the bluetooth via the BIOS made my laptop very stable, and turning them on makes it very unstable, but....):

I was trying to copy 7 Gigs over the LAN card. The wifi/bluetooth switch was *OFF*.

Eventually the NIC stopped functioning. ifup/ifdown would not restore it.

Here's what I found in the system log:

Oct 17 09:01:06 ThinkpadT61 avahi-daemon[3186]: Registering new address record for 10.0.0.3 on wlan0.
Oct 17 09:01:07 ThinkpadT61 kernel: irq 177: nobody cared (try booting with the "irqpoll" option)
Oct 17 09:01:07 ThinkpadT61 kernel:  [<c01402e3>] __report_bad_irq+0x2b/0x69
Oct 17 09:01:07 ThinkpadT61 kernel:  [<c01404d0>] note_interrupt+0x1af/0x1e7
Oct 17 09:01:07 ThinkpadT61 kernel:  [<f887d49e>] usb_hcd_irq+0x23/0x50 [usbcore]
Oct 17 09:01:07 ThinkpadT61 kernel:  [<c013fae7>] handle_IRQ_event+0x23/0x49
Oct 17 09:01:07 ThinkpadT61 kernel:  [<c013fbc0>] __do_IRQ+0xb3/0xe8
Oct 17 09:01:07 ThinkpadT61 kernel:  [<c01050e5>] do_IRQ+0x43/0x52
Oct 17 09:01:07 ThinkpadT61 kernel:  [<c01036b6>] common_interrupt+0x1a/0x20
Oct 17 09:01:07 ThinkpadT61 kernel:  [<c01e007b>] acpi_ex_create_method+0x9f/0xa3
Oct 17 09:01:07 ThinkpadT61 kernel:  [<f88495ac>] acpi_processor_idle+0x1ec/0x380 [processor]
Oct 17 09:01:07 ThinkpadT61 kernel:  [<c0101b52>] cpu_idle+0x9f/0xb9
Oct 17 09:01:07 ThinkpadT61 kernel: handlers:
Oct 17 09:01:07 ThinkpadT61 kernel: [<f887d47b>] (usb_hcd_irq+0x0/0x50 [usbcore])
Oct 17 09:01:07 ThinkpadT61 kernel: Disabling IRQ #177

I have noticed this before: copying large (Gigs) of data over the T61 NIC kills it. In fact, I pretty much convinced I can't copy more than approx 1 gig without failure (this is over sftp using nautilus if that's of any relevance). Please do note, the NIC had died, not nautilus, and ifup/ifdown did not restore it.

Thinkpad T61, Debian Linux, kernel 2.6.18-5-686.

I'm not the most technical tool in the shed but if there is anything I can do to help, let me know. 

Maybe running bluetooth/wifi or also heavy NIC traffic is putting some kind of load on the system ... in other words, it is not those devices specifically that cause the problem, just what those devices demanding of the hw/sw? Remove (or reduce) the demand and the problem becomes less frequent. (I'm such a newb...)
Comment 76 Thomas B. Rücker 2007-10-17 10:17:01 UTC
Just a sidenote on the NIC issue:
Usually "rmmod e1000 && modprobe e1000" takes care of this. (Yes this cures only a symptom)

I remember seeing this behaviour a lot (like every minute) when copying at GBit rates from a fileserver to a SATA drive attached to a express-card controller. reloading e1000 and reconfiguring the interface would be needed every 1-2minutes sometimes 3minutes if the throughput was less. I was amazed though that the TCP connection survived the 20-30x module reloads...
Comment 77 Volker Braun 2007-10-17 12:01:14 UTC
As for the e1000 ethernet, I had problems with some old kernels. Upgrading to a current one solved them. Transferring multiple GB works perfectly on 2.6.22. I do not have a gigabit ethernet for testing, though.
Comment 78 mathew 2007-10-17 12:42:33 UTC
Just a "me too".

dmesg shows "Disabling IRQ #23" @ 625 seconds. If I then plug in a USB device, USB isn't working via the 2 left ports. The right port works.

T61. No fingerprint reader, nothing plugged in to USB until after the problem occurs, no ndiswrapper, problem occurs even if using VESA X.org drivers (i.e. not due to proprietary Nvidia driver), problem also occurs if networked via ethernet with no wifi.

/proc/interrupts shows 23: CPU0 64688 CPU1 35313 IO-APIC-fasteoi timer ehci_hcd:usb7

Brand new ThinkPad from Lenovo, so it should be the latest BIOS. Let me know if there's any other useful information I can provide.
Comment 79 Benjamin Herrenschmidt 2007-10-17 14:56:38 UTC
Thomas, can you try my hack to report the bogus IRQ counts and tell me what the counter look like in /proc/interrupts ? I added the bogus IRQ count in parenthesis at the end of each line.

It's also possible that this is an unrelated problem. That is, I can see how a network driver using NAPI could cause occasional (or even frequent) short interrupts, and older kernels such as your 2.6.18 don't even try to ignore them, so you get a timebomb waiting to happen.

Just a possibility though, worth investigating. Can you also try a 2.6.23 kernel ?
Comment 80 Jan Gutter 2007-10-22 05:33:05 UTC
I've upgraded to kernel 2.6.23-gentoo and the new BIOS for the X61T: 7SET21WW-1.07. Embedded controller is still 7RHT16WW-1.02.

The problem still occurs :-(

But, now I'm experiencing interrupts on GSI-19 at roughly 144/sec (as opposed to 200/sec). I'm not sure what caused the frequency change, but if you think it'll help, I can re-run on one of the older kernel configs + version.
Comment 81 Christian Lachner 2007-10-23 23:14:38 UTC
I flashed the today released 7KET56WW-1.26 BIOS for my T61 (7664-18G) and it also didn't fix that problem.

Do the people at lenovo know of that problem/bug? If it's bios related they probably should know about it...
Comment 82 Xiaoyang Yu 2007-10-24 00:14:30 UTC
A similar bug report is on Ubuntu's lanchpad:

Installed Gutsy can not boot on Santa Rosa 
https://bugs.launchpad.net/bugs/153823
Comment 83 Ingvar Hagelund 2007-10-29 08:48:23 UTC
Similar on Thinkpad R61 7732 (also with Santa Rosa chipset) with fedora-development (ie soon-to-be fedora 8), kernel 2.6.23.1-37.fc8 on i386. x86_64 with similar kernel has the same problem.

Ingvar


The kernel says:

irq 21: nobody cared (try booting with the "irqpoll" option)
 [<c045b0c6>] __report_bad_irq+0x36/0x75
 [<c045b2dc>] note_interrupt+0x1d7/0x213
 [<c057abd5>] usb_hcd_irq+0x21/0x4e
 [<c045a787>] handle_IRQ_event+0x23/0x51
 [<c045ba67>] handle_fasteoi_irq+0x86/0xa6
 [<c045b9e1>] handle_fasteoi_irq+0x0/0xa6
 [<c04074c3>] do_IRQ+0x8c/0xb9
 [<c041cb3e>] lapic_next_event+0xc/0x10
 [<c0443712>] clockevents_program_event+0xb5/0xbc
 [<c0405b6f>] common_interrupt+0x23/0x28
 [<c04400d8>] hres_timers_resume+0x33/0x5f
 [<c061e911>] _spin_unlock_irqrestore+0xa/0x13
 [<c0443ae0>] tick_notify+0x1cc/0x2e4
 [<c040ae46>] sched_clock+0x8/0x20
 [<c0425ee1>] __update_rq_clock+0x1c/0x149
 [<c06204a3>] notifier_call_chain+0x2a/0x47
 [<c0437a2c>] raw_notifier_call_chain+0x17/0x1a
 [<c0443741>] clockevents_notify+0x19/0x3c
 [<c0537d38>] acpi_idle_enter_simple+0x174/0x1c4
 [<c05a2049>] cpuidle_idle_call+0x5c/0x7f
 [<c05a1fed>] cpuidle_idle_call+0x0/0x7f
 [<c040340b>] cpu_idle+0xab/0xcc
 =======================
handlers:
[<c057abb4>] (usb_hcd_irq+0x0/0x4e)
Disabling IRQ #21
Comment 84 James Peverill 2007-10-29 09:19:20 UTC
  I have a T61P and can confirm exact symptoms.  Is there someone at Lenovo and/or Intel we should be bugging to investigate this?  It certainly reeks of a hardware issue.

James
Comment 85 Andreas Schneider 2007-10-31 02:37:14 UTC
The problem is gone on my Thinkpad T61 (766314G) with the latest BIOS version 1.26.
Comment 86 Jan Gutter 2007-10-31 03:19:24 UTC
My BIOS(1.07) for the X61T is dated 10 October, I see 1.26 for the T61 is dated 18 October. The changelog for both of these look similar. Mine definitely still shows the problem, though.

Andreas, would it be a lot to ask if you could possibly flash the previous version, and verify that it's definitely been fixed? I usually just check /proc/interrupts before and after a BIOS update.

If this is really fixed in the BIOS, it might also be applicable to other makes and models sporting the Santa Rosa chipset...

PS. Kudos to Lenovo for fixing a bug in the BIOS on the T61's that improves Linux compatibility! (The fix for: Volume and mute buttons on the keyboard do not work on Linux.)
Comment 87 Paul C. Bryan 2007-10-31 11:07:08 UTC
I can confirm I have this on a Lenovo ThinkPad X61, using Ubuntu 7.10 ( 2.6.22-14-generic AMD64 kernel).

Are there any known workarounds to this issue? Sounds like neither irqpoll nor noapic are good ones.
Comment 88 Ingvar Hagelund 2007-10-31 16:19:21 UTC
Latest BIOS (7KET56WW - 1.26-1.06 - 2007/10/18) does NOT fix the problem on R61 7732-1EG (Santa Rosa chipset) running Fedora devel on kernel 2.6.23.1-42.fc8.

Ingvar
Comment 89 Benjamin Herrenschmidt 2007-10-31 16:47:01 UTC
Same here, 1.26-1.06 (7LET56WW) on a T61, problem still present.
Comment 90 Oliver Neukum 2007-11-05 00:48:43 UTC
This has been debugged to death from a USB viewpoint. It sometimes happens without USB devices being plugged in. The trigger seems to be that the plugging event wakes the machine up.
Comment 91 Benjamin Herrenschmidt 2007-11-05 02:59:17 UTC
Ugh ? Oliver, are we talking about the same problem here ? The machine -is- already up, there is no suspend/resume process involved as far as I can tell. It's purely that when the USB bluetooth device becomes active on the internal USB, the interrupt line of the EHCI that is paired with the UHCI that drives it starts getting that 200HZ or so spurrious interrupt flood. Do you think it could be some kind of wakeup thing coming from the EHCI ? It's not in D3 state... I've tried reading the status reg from it and it doesn't show any irq condition ...
 
Comment 92 Paul 2007-11-05 08:23:42 UTC
I've compiled and installed kernel 2.6.23.1 with the realtime patch patch-2.6.23.1-rt5, noapci, to do some audio work on my T61. The problem appears even without the wifi/bluetooth switch off but with the RT patch the problem occurs quite frequently. Basically I lose the mouse and the keyboard giving the system the appearance of being frozen. I found that plugging in an external keyboard (and pressing some keys in it?) that the mouse and keyboard starts working again and I can continue. Then eventually I have to do it again. 

I had the exact same problem when I boot into Windows XP (and plugging in an ext keyboard had the same effect, i.e. it "revives" the USB), but now the problem in Windows XP has gone away completely (after installing upgrades from Lenovo and Microsoft, not sure what fixed it). 

The problem persists in Linux. Same laptop, dual boot, most recent BIOS.

Hopefully this provides some clues.
Comment 93 Paul 2007-11-05 08:24:51 UTC
Sorry, that should have read "even with the wifi/bluetooth switch off".
Comment 94 Dax Kelson 2007-11-06 20:05:56 UTC
I have a T61p, with the same problem since I got it in August.

irq 23: nobody cared (try booting with the "irqpoll" option)
 [<c045ad5e>] __report_bad_irq+0x36/0x75
 [<c041c6a4>] lapic_timer_broadcast+0x11/0x12
 [<c045af74>] note_interrupt+0x1d7/0x213
 [<c0579389>] usb_hcd_irq+0x21/0x4e
 [<c045a41f>] handle_IRQ_event+0x23/0x51
 [<c045b6ff>] handle_fasteoi_irq+0x86/0xa6
 [<c045b679>] handle_fasteoi_irq+0x0/0xa6
 [<c04074cf>] do_IRQ+0x8c/0xb9
 [<c05246fe>] acpi_hw_register_read+0xf1/0x156
 [<c0405b6f>] common_interrupt+0x23/0x28
 [<c053667e>] acpi_processor_idle+0x2a7/0x445
 [<c05363d7>] acpi_processor_idle+0x0/0x445
 [<c05363d7>] acpi_processor_idle+0x0/0x445
 [<c040340b>] cpu_idle+0xab/0xcc
 [<c073ba6c>] start_kernel+0x32c/0x334
 [<c073b177>] unknown_bootoption+0x0/0x195
 =======================
handlers:
[<c0579368>] (usb_hcd_irq+0x0/0x4e)
Disabling IRQ #23

kernel: 2.6.23.1-10.fc7
Comment 95 Oliver Neukum 2007-11-14 15:36:24 UTC
#91, I was under the mistaken impression that leaving the C3 state causes it. But it rather seems to be an interrupt routing issue.
Comment 96 christian 2007-11-19 00:31:36 UTC
I now switched to 64 Bit and the problem seems to be gone ..... 
Comment 97 Cyril Jaquier 2007-11-19 23:56:14 UTC
Same here too. T61 with 7LET56WW (1.26-1.06) and Debian kernel (2.6.22 or 2.6.23).
Comment 98 Lorenzo 2007-11-21 01:59:59 UTC
One (ugly) way to make this bug go away: recompile the kernel taking away the  ehci support from USB (under "device drivers--> USB support"). You're left with the VERY SLOW uhci USB1.1, but at least all USB ports work. I'm using gentoo, with kernel 
Linux Challenger 2.6.22-gentoo-r8 #11 SMP PREEMPT  
on a new thinkpad T61.
Comment 99 Theodore Tso 2007-11-21 05:15:38 UTC
I have an X61s with the latest (1.08, 9/26/2007) BIOS and 2.6.24-rc3, and I'm seeing exactly the same behavior.  Disabling the bluetooth does make or unloading the ehci_hcd module makes the problem go away; unloading and reloading the ehci_hcd module seems to restore the broken USB behavior.

I have not tried going to a 64bit-kernel, but I suppose that would be an interesting next step.
Comment 100 Jan Gutter 2007-11-23 01:41:23 UTC
SuSE's bugzilla also has this bug logged: https://bugzilla.novell.com/show_bug.cgi?id=325601

Does anyone on this (now sizable!) mailing list have a contact at Lenovo who's able to help us here? If not, the only option might be a (gulp!) workaround.

Also, #97, does "same here" mean you also suffer from the bug, or does it mean 64-bit solved it for you?
Comment 101 Benjamin Herrenschmidt 2007-11-23 12:56:40 UTC
I have a contact at lenovo who is investigating, but I have no more infos on what the status is there.
Comment 102 Cyril Jaquier 2007-11-24 10:39:03 UTC
(In reply to comment #100)
...
> Also, #97, does "same here" mean you also suffer from the bug, or does it
> mean
> 64-bit solved it for you?
> 

Sorry :/ I also suffer from this bug using Debian AMD64.
Comment 103 Tomas Carnecky 2007-11-26 00:22:44 UTC
(In reply to comment #91)
> Ugh ? Oliver, are we talking about the same problem here ? The machine -is-
> already up, there is no suspend/resume process involved as far as I can tell.
> It's purely that when the USB bluetooth device becomes active on the internal
> USB, the interrupt line of the EHCI that is paired with the UHCI that drives
> it
> starts getting that 200HZ or so spurrious interrupt flood. Do you think it
> could be some kind of wakeup thing coming from the EHCI ? It's not in D3
> state... I've tried reading the status reg from it and it doesn't show any
> irq
> condition ...

Speaking of wakeup and EHCI: I'm trying to debug #9258. And only recently I saw 'irq 19 nobody cared' in dmesg output. But that only happened right after resume from C3/C4 (don't remember exactly). I haven't seen that message in other situations (eg. when the laptop runs normally). The laptop is a X61 Tablet with 2.6.24-rc3, without any bluetooth support compiled in. If you think these two bugs could be related or want more infos about my setup just give me shout.
Comment 104 Will Kemp 2007-12-02 06:55:49 UTC
(In reply to comment #96)
> I now switched to 64 Bit and the problem seems to be gone ..... 

I've just installed the 64 bit kernel on my R61 and the problem definitely hasn't gone away for me.

By the way, a limited workaround for me is to plug the mouse into one of the two USB ports that have the problem. When the kernel disables interrupts, the mouse carries on working. Then i've still got the LHS port to plug other things into.
Comment 105 Christian Lachner 2007-12-16 22:26:40 UTC
There is a new BIOS-Update available for the R61/T61's (2.07/1.08) which is supposed to fix the interrupt problem. http://www-307.ibm.com/pc/support/site.wss/document.do?sitestyle=lenovo&lndocid=MIGR-67989

Hopefully that annoying bug is finally fixed - TESTING :)
Comment 106 Benjamin Herrenschmidt 2007-12-17 00:13:50 UTC
Applied it and I see no more spurrious EHCI interrupts. Looks good here !
Comment 107 Jan Gutter 2007-12-17 00:47:35 UTC
That's awesome news. Now I just have to wait for the X61T BIOS...

Thanks everyone for your help! Thanks Lenovo for fixing this bug too!

So, is there a bug resolution for "fixed in hardware"? ;-)

Jan
Comment 108 Theodore Tso 2007-12-17 06:31:10 UTC
Well, for R61 and T61 owners, yes.   For X60/X61 owners, it's "hopefully Lenovo will see fit to fix it in a BIOS update sometime soon".  (i.e., it's not fixed for everyone just yet, but there is hope that it will be fixed in firmware --- and it seriously suggests that it can't be fixed at the kernel, so we probably close close the bug report....)
Comment 109 Dax Kelson 2007-12-17 11:14:27 UTC
Confirmed fixed on my T61p.
Comment 110 Cyril Jaquier 2007-12-17 13:37:42 UTC
Fixed on my T61 too.
Comment 111 Nemanja Stefanovic 2007-12-17 13:45:25 UTC
Same here. Thanks for the notice Christian!
Comment 112 Benjamin Herrenschmidt 2007-12-19 13:26:57 UTC
According to a contact at Lenovo, the fix is already in the latest X61 BIOS update. Ted, can you verify ?
Comment 113 Paul C. Bryan 2007-12-19 13:38:53 UTC
The latest BIOS update I see on the Lenovo site is v1.10 (24 Oct 2007). I don't see any reference in its changelog as I do in the T61 such as, "(Fix) Unexpected interrupts from the USB controller may occur." Is there a link to a more recent X61 BIOS update?
Comment 114 Benjamin Herrenschmidt 2007-12-19 13:52:04 UTC
That may well be the one. Can you verify that the problem still occurs with this version of the BIOS ?
Comment 115 Oliver Neukum 2007-12-20 02:23:38 UTC
http://www-307.ibm.com/pc/support/site.wss/MIGR-67988.html

This bug is mentioned in the changelog. If you don't update the BIOS, you must give "noirqdebug" on the kernel command line. It'll shorten your time on battery though.
Comment 116 Paul C. Bryan 2007-12-20 07:06:36 UTC
@Oliver: It's mentioned in the changelog for the R61/T61/T61p, not for the X61 or X61 Tablet.

@Benjamin: I'll upgrade my BIOS to 1.10 tonight or tomorrow and let you know how it affects my system.
Comment 117 Will Kemp 2007-12-20 15:04:44 UTC
There doesn't appear to be a BIOS update that applies to my R61 (8932-A13) model.  According to the Lenovo site, it's not one of the models that are supported by the BIOS update referred to above. The BIOS version on this computer starts with 7O, not 7L or 7K, as specified in in the Lenovo BIOS update pages.

So it looks like this problem isn't solved for me yet.
Comment 118 Benjamin Herrenschmidt 2007-12-20 15:54:13 UTC
I assume you have verified that the latest BIOS available for your model still has the problem regardless of whether the changelog talks about the fix or not ? I need to be sure before I go back to Lenovo.
Comment 119 Paul C. Bryan 2007-12-20 17:22:54 UTC
I just upgraded from BIOS version 1.03 to 1.10. I can still reproduce the problem. To wit:

[  796.048578] irq 19: nobody cared (try booting with the "irqpoll" option)
[  796.048590] 
[  796.048591] Call Trace:
[  796.048596]  <IRQ>  [<ffffffff8026aade>] __report_bad_irq+0x1e/0x80
[  796.048643]  [<ffffffff8026adc3>] note_interrupt+0x283/0x2c0
[  796.048663]  [<ffffffff8026b92d>] handle_fasteoi_irq+0xdd/0x110
[  796.048679]  [<ffffffff8020c6ab>] do_IRQ+0x7b/0x100
[  796.048690]  [<ffffffff8020a3a1>] ret_from_intr+0x0/0xa
[  796.048696]  <EOI>  [<ffffffff88025ac9>] :processor:acpi_processor_idle+0x25f/0x456
[  796.048748]  [<ffffffff88025abf>] :processor:acpi_processor_idle+0x255/0x456
[  796.048766]  [<ffffffff8802586a>] :processor:acpi_processor_idle+0x0/0x456
[  796.048779]  [<ffffffff802090c0>] cpu_idle+0x70/0xc0
[  796.048817] 
[  796.048820] handlers:
[  796.048824] [<ffffffff8807dd30>] (usb_hcd_irq+0x0/0x60 [usbcore])
[  796.048859] Disabling IRQ #19

So, I would say that the bug has *not* been fixed in the latest BIOS update for the X61.

I suppose on the upside now that I've updated by BIOS, when my machine boots it claims it's Energy Star compliant. :)
Comment 120 Will Kemp 2007-12-21 04:19:02 UTC
Well, i'm completely at a loss to explain this, but the problem seems to have gone away of its own accord on my system! It's an R61 (8932-A13), running Fedora 8, with a home-built 2.6.23.9 x86_64 kernel which previously had the problem - and, as far as i know, i haven't done anything to fix it.

I can't say for certain when it disappeared, as i don't normally shut down and reboot my system for days on end (i use hibernate or suspend at night). However, i think it disappeared yesterday.

The only thing i can think that could have fixed it is an automatic update of the WinXP system that's dual bootable. I almost never boot into Windows, but i did yesterday - and Windows did an automatic update. Perhaps it did some BIOS configuration that's persistent.

Naturally, i can't undo this and verify my suspicion. And i can't yet say for certain that the problem's really gone away and it's not just taking much longer than usual to Linux to kill that interrupt, for some reason. But it's currently been up for 30 mins without that interrupt being disabled.

I first noticed this morning that those two USB ports were still working, even though the uptime was about 12 hours. I had a (Fedora) kernel update yesterday and i wondered if that had fixed it, so i rebooted with a kernel that i knew still had the problem and it appears to have gone away.
Comment 121 Paul C. Bryan 2007-12-21 11:15:59 UTC
@Will: Did you change your Bluetooth usage? Did you accidentally nudge the kill switch for wireless? Both seem to have an effect on how long it takes for Linux to kill the interrupt.
Comment 122 Benjamin Herrenschmidt 2007-12-21 13:19:43 UTC
It's also possible that Fedora switched to the new BT HCI driver which, I've been told, have been improved to generate much less interrupts. Thus, the IRQ shutdown is much less likely to happen, though the underlying HW issue is still there.

You can check if you still see IRQs being counted for the EHCI to which nothing is connected in /proc/interrupts
Comment 123 Will Kemp 2007-12-21 14:03:16 UTC
Ah, yeah. Bluetooth seems to have disappeared for some reason. I haven't used it for a while so i didn't notice. I don't know why it's gone, so i guess i'll have to investigate - or maybe i'll just leave it off and get my USB ports back! ;-)

Sorry about the false alarm.
Comment 124 Scott Mcdermott 2008-01-01 22:15:11 UTC
This error is NOT corrected with the latest X61s BIOS.  I have a new X61s I just recieved a few days ago; it exhibits this behavior with BIOS version 1.11:

    thinkpad_acpi: ThinkPad BIOS 7NET30WW (1.11 ), EC 7MHT24WW-1.02

It's strange that I have 1.11, because 1.10 appears to be the latest version available on the web site.  It must be because my system is so new.  I am going through the process of configuring the system and troubleshooting any errors in the logs, and this was one of the first ones I found:

    kernel: irq 20: nobody cared (try booting with the "irqpoll" option)
    kernel:  [<c045b16a>] __report_bad_irq+0x36/0x75
    kernel:  [<c045b380>] note_interrupt+0x1d7/0x213
    kernel:  [<c057ae99>] usb_hcd_irq+0x21/0x4e
    kernel:  [<c045a807>] handle_IRQ_event+0x23/0x51
    kernel:  [<c045bb0b>] handle_fasteoi_irq+0x86/0xa6
    kernel:  [<c045ba85>] handle_fasteoi_irq+0x0/0xa6
    kernel:  [<c04074c3>] do_IRQ+0x8c/0xb9
    kernel:  [<c0405b6f>] common_interrupt+0x23/0x28
    kernel:  [<c05382e6>] acpi_idle_enter_simple+0x17b/0x1f1
    kernel:  [<c0537f40>] acpi_idle_enter_bm+0xc3/0x2ee
    kernel:  [<c05a2399>] cpuidle_idle_call+0x5c/0x7f
    kernel:  [<c05a233d>] cpuidle_idle_call+0x0/0x7f
    kernel:  [<c040340b>] cpu_idle+0xab/0xcc
    kernel:  =======================
    kernel: handlers:
    kernel: [<c057ae78>] (usb_hcd_irq+0x0/0x4e)
    kernel: Disabling IRQ #20

This occurred about 10 minutes after boot, with radios enabled.
Also note that GSI 19 (level, low) -> IRQ 20.
Comment 125 Johannes H. Jensen 2008-01-04 19:18:21 UTC
Just bumping to confirm this issue is very much present on my X61. 

Benjamin: any status update from your contact at lenovo?
Comment 126 Benjamin Herrenschmidt 2008-01-04 19:30:10 UTC
Nope, not yet, but monitor the BIOS update site as they may not tell me right away if/when they post a fixed version
Comment 127 Theodore Tso 2008-01-04 20:37:09 UTC
I just noticed that the 1.11 version of the BIOS is available.  On the Lenovo web site, the Driver Matrix page for the X61/X61s claims the latest BIOS version is 1.10, but if you actually click on the link (http://www-307.ibm.com/pc/support/site.wss/document.do?lndocid=MIGR-67982) it takes you to a page for the 1.11 version of the BIOS.

Having tried it, I can also confirm that it doesn't fix the problem (not that surprising, the change log doesn't mention fixing the spurious BIOS problem, and that was mentioned in the T61 firmware update).   It looks like the same Bios is used for the Thinkpad Reserve edition, which is priced at roughly twice the normal X61s, but supposed has better service.  Maybe if we can find someone who purchased it who can complain to Lenovo?  :-)
Comment 128 Jan Gutter 2008-01-07 01:47:32 UTC
After returning from holiday, I also upgraded my X61 Tablet to 1.08 (which seems to have the same changelog as X61/X61s 1.11). Also negatory on the fix :-( It looks like the X61 and X61 Tablet BIOS code is unified, though...

BIOS 7SET22WW (1.08) has the same changelog as
BIOS 7NET30WW (1.11)

Well, since Santa obviously gave me coal for Christmas, I'll have to wait for my birthday present then....
Comment 129 Aditya Rajgarhia 2008-01-22 22:39:53 UTC
I have a T61p on which the 2 USB ports on the right did not work (it was the bug on this page). However, flashing the BIOS to version 2.07 WORKS!!!

Also, previously with BIOS 1.26, adding 'irqpoll' to the boot options in grub and setting IRQ in the BIOS are set to Auto-Detect for all hardware would work but with annoying side effects (which would make GNOME pop up stupid messages about my CDROM drive all the time).

Anyway, let me know if you need any output or something. As far as I am concerned, bug 8853 is solved :D

Aditya
Comment 130 Jan Gutter 2008-01-23 08:10:46 UTC
I did a quick check on the Lenovo site and read the BIOS changelogs. I looked for the line:

(Fix) Unexpected interrupts from the USB controller may occur.

Thinkpad models without the fix listed (note, this does not necessarily mean they are affected by this bug):

ThinkPad R61 15 inch models (8942, 8943, 8944, 8945, 8947, 8948, 8949) 
ThinkPad X61, X61s, Reserve Edition
ThinkPad X61 Tablet
ThinkPad Z61e, Z61m, Z61p, Z61t

Thinkpad models with the fix listed:

ThinkPad R61 14.1inch widescreen with IEEE 1394, ThinkPad T61, T61p 
ThinkPad R61 and R61i 14.1inch widescreen without IEEE 1394, R61 15.4inch widescreen 

It seems that all the BIOSes where the bugs had been fixed had their Embedded Controller (EC) revision bumped. I assume that the bug's likely to be there.

Any ETA from the man at Lenovo?
Comment 131 Will Kemp 2008-01-29 05:22:57 UTC
Thanks for pointing that out, Jan! That BIOS update doesn't come up when i enter the model ID of my Thinkpad (8932-A13) directly into the Lenovo support page, so i hadn't found it when i'd looked previously.

I've applied this update and it appears to have fixed the problem. So far, bluetooth has been enabled for 45 minutes and the RHS USB ports are still working - with no kernel message about disabling interrupt.

So it appears this bug is solved for the 15.4" widescreen model with IEEE 1394 - 8932-A13.
Comment 132 Jan Gutter 2008-02-04 21:24:20 UTC
I updated today, and I saw a list of BIOS updates with the fix listed! Only the Z61 series BIOS doesn't have the fix listed, but I'm not sure whether the problem affects that series, though.

I didn't see any problems after 18 minutes of uptime, and no continuous stream of spurious interrupts seem to be visible. (X61 Tablet 7767-B8G is FIXED!)

Thank you all for a sterling job on identifying and isolating a niggling bug, and thanks to Lenovo for fixing it! Benjamin, would you please convey my glee to them?!?!?
Comment 133 Kevin R. Page 2008-02-06 06:39:27 UTC
Confirmed fixed here on a Thinkpad X61s when updated to BIOS 2.06 (7NETA6WW). Thanks!
Comment 134 Johannes H. Jensen 2008-02-06 07:00:48 UTC
Confirmed fixed here as well on a ThinkPad X61! Updating the BIOS without a optical drive was a bit tricky, but worked when I followed this guide on ThinkWiki: http://www.thinkwiki.org/wiki/BIOS_Upgrade/X_Series#Approach_3:_Alternative_method_using_a_USB_stick
Comment 135 Paul C. Bryan 2008-02-06 09:59:00 UTC
Also confirmed fixed on X61. Thanks Johannes for the pointer to the alternative firmware upgrade approach -- it saved me having to setup a bootable Windows physical machine.
Comment 136 Jim Paris 2008-02-08 14:01:16 UTC
It also fixed my X61.  Updating was a real pain -- every attempt I made at using [FreeDOS or DOS 6.22] with [pxeboot+MEMDISK or USB flash drive] led to a successful boot but a system freeze during update.  Johannes' suggested method (essentially the same but requiring Windows to create the boot disk) worked for some reason.

Is it worth having the kernel check for these known-bad BIOS versions and printing a warning?  Or maybe something handled by HAL similar to the battery info.is_recalled stuff?
Comment 137 Len Brown 2008-02-27 22:51:41 UTC
Confirmed that Lenovo BIOS 2.10 fixes this issue on T61.

No, I'm not excited about adding DMI entries to Linux
to warn users to upgrade their BIOS -- as there are lots
of versions of the BIOS, and looking at the DMI for the
one I just loaded, it doesn't even match what IBM
calls the BIOS (DMI says 2.16, Lenovo calls it 2.10)

closed.

Note You need to log in before you can comment on or make changes to this bug.