Bug 43238

Summary: No interrupts on add-in PCI ethernet card - Intel H77
Product: ACPI Reporter: Lubos Dolezel (lubosd)
Component: BIOSAssignee: Len Brown (lenb)
Status: CLOSED DOCUMENTED    
Severity: normal CC: acpi-bugzilla, aklhfex, alan, bjorn, dap, feng.tang, galens, peter_ludwig
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.3.5 Subsystem:
Regression: No Bisected commit-id:
Attachments: lspci -v, dmesg and kernel .config
acpidump output
/proc/interrupts contents
lspci -v output
dmesg with acpi=off
/proc/interrupts with acpi=off
lspci -v with acpi=off
dmesg with acpi debugging ON
dmesg with PCI_DEBUG
e1000 debug patch
dmesg with patch to e1000 applied
acpidump of Intel dh77kc motherboard with version 0095 bios
original disassembly of DSDT table from acpidump.out
edited version of DSDT with corrected conventional PCI interrupt routing
dmesg, lspci -v, /proc/interrupts, and ifconfig after DSDT override on dh77kc

Description Lubos Dolezel 2012-05-12 15:30:36 UTC
Created attachment 73261 [details]
lspci -v, dmesg and kernel .config

Hello,

I have a new motherboard based on Intel H77, it's an Intel DH77KC. I needed to use an extra ethernet card, tried three different PCI cards, but none of them worked. It appears the driver is not receiving any interrupts. /proc/interrupts shows only zeroes for "eth1".

This causes the interface to have the NO-CARRIER flag even when mii-tool reports that a connection has been negotiated. And of course, the interface doesn't work.

What helps? Only acpi=off, other options either don't help or make the system unbootable. Turning ACPI off is however not acceptable as it disables all other CPU cores but the first one.

Lubos
Comment 1 Len Brown 2012-05-15 02:30:54 UTC
Please attach the output from acpidump

Please verify that you are running the latest BIOS,
and if not, upgrade, re-test, and attach the new acpidump
output along with the new output from lspci -v


The failing device is this one, yes?

05:01.0 Ethernet controller: Intel Corporation 82541PI Gigabit Ethernet Controller (rev 05)


[    1.130160] e1000: Intel(R) PRO/1000 Network Driver - version 7.3.21-k8-NAPI
[    1.130254] e1000: Copyright (c) 1999-2006 Intel Corporation.
[    1.130365] e1000 0000:05:01.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17

IRQ 17 is not on a programmable PCI interrupt link,
so that can't be the problem...

Conceivably it is simply hard-mapped in the PRT to the wrong IRQ...

What does /proc/interrupts show?
If your spray packets at the interface, does some
other devices interrupt count go up?

a random guess...
What happens if you boot with "pci=nomsi"
Comment 2 Lubos Dolezel 2012-05-16 18:50:51 UTC
pci=nomsi was one of the options I've tried with no success. I also tried pci=noacpi with no luck.

Yeah, it's the ethernet card at 0000:05:01.0. It appears it's not the only piece of PCI* hardware having problems. I've noticed my PCI-E DVB-S card (02:00.0) is also reporting some issues:

cx23885_wakeup: 5 buffers handled (should be 1)

I have the latest BIOS version available.
Comment 3 Lubos Dolezel 2012-05-16 18:51:14 UTC
Created attachment 73311 [details]
acpidump output
Comment 4 Lubos Dolezel 2012-05-16 18:51:38 UTC
Created attachment 73312 [details]
/proc/interrupts contents
Comment 5 Lubos Dolezel 2012-05-16 18:52:06 UTC
Created attachment 73313 [details]
lspci -v output
Comment 6 Lubos Dolezel 2012-05-16 19:11:35 UTC
I manually added an 'arp' entry on my laptop with the MAC address of the broken interface and tried a 'ping -i0' to send a lot of packets towards that interface.

I failed to notice any increased interrupt activity.
Comment 7 Feng Tang 2012-05-21 12:11:15 UTC
Hi Lubos,

Could you also post the working dmesg/lspci/proc_interrupts info when you use the "acpi=off"?

- Feng
Comment 8 Lubos Dolezel 2012-05-26 19:44:39 UTC
Created attachment 73408 [details]
dmesg with acpi=off
Comment 9 Lubos Dolezel 2012-05-26 19:44:57 UTC
Created attachment 73409 [details]
/proc/interrupts with acpi=off
Comment 10 Lubos Dolezel 2012-05-26 19:45:14 UTC
Created attachment 73410 [details]
lspci -v with acpi=off
Comment 11 Lubos Dolezel 2012-05-26 19:45:39 UTC
Created attachment 73411 [details]
dmesg with acpi debugging ON
Comment 12 Feng Tang 2012-06-05 08:49:17 UTC
Thanks for the new debug info, but I can't find any good clue yet, wish I could have that board too, I compare the DSDT PRT info with your new kernel msg, nothing wrong I can tell. 

Could you try enable CONFIG_PCI_DEBUG, and append "apic=debug" to the kernel cmd line?

- Feng
Comment 13 Lubos Dolezel 2012-06-12 09:41:42 UTC
I'll do that.

I'll also try installing FreeBSD and possibly Windows 7 to see if it works and if so, then what is different in interrupt routing tables.
Comment 14 Lubos Dolezel 2012-06-12 21:08:09 UTC
Hello again,

tonight I tried both FreeBSD and Windows.

Under FreeBSD, the card exhibits the exactly same erratic behavior. No packets seem to get through and after a while, the kernel watchdog tries to restart the device.

Under Windows 7, however, the card works just fine! I'm not very familiar with Windows internals, but the Device Manager shows that interrupt 16 is used. On Linux, interrupt 17 is used.

Interrupt 17 must be broken or something, since all devices on int 16 work even under Linux.

Is there any - no matter how hackish - way to force interrupt 16 for the card?
Comment 15 Lubos Dolezel 2012-06-12 21:13:31 UTC
Created attachment 73573 [details]
dmesg with PCI_DEBUG

This is the dmesg output with PCI_DEBUG enabled and apic=debug.
Comment 16 Feng Tang 2012-06-13 05:30:05 UTC
Created attachment 73574 [details]
e1000 debug patch

Good to know both Win 7 and apci=off works. One wired thing is the ACPI PRT table set the IRQ to 17, while win 7 used the 16. Win 7 doesn't use ACPI routing table?

Could you try the attached patch, which is to ensure the IRQ is really enabled on IOAPIC side.
Comment 17 Feng Tang 2012-06-14 04:57:23 UTC
Another test worth of try is disable the working eth0 device (like disable its e1000e driver) and see whether the eth1 works.
Comment 18 Lubos Dolezel 2012-06-14 17:44:15 UTC
Created attachment 73601 [details]
dmesg with patch to e1000 applied
Comment 19 Feng Tang 2012-06-15 06:29:41 UTC
(In reply to comment #18)
> Created an attachment (id=73601) [details]
> dmesg with patch to e1000 applied

Thanks for test.

From the dmesg, the ioapic is setup ok. And the fact that Win 7 works without using ACPI PRT table, implies the ACPI FW may have some problem with PRT table, and I don't think the IRQ 17 itself has some problem.

You can try to update BIOS for a further check.
Comment 20 Lubos Dolezel 2012-06-15 07:05:57 UTC
Oh yes, the BIOS update bricked the motherboard, so now it's up for RMA.

I'll probably try to convince the seller to take the board back and let me pick another model, but I'm not sure they'll like that idea. I'll keep you posted.
Comment 21 Lubos Dolezel 2012-06-15 07:21:59 UTC
Another idea is that Windows 7 uses UEFI to boot the system. Could that have any effect on interrupts?
Comment 22 Pallai Roland 2012-06-16 01:23:40 UTC
Hi,

Same problem here with KNC1 DVB-C PCI- and Promise TX4 SATA PCI cards.

DH77KC MoBo, Intel(R) Core(TM) i7-3770 CPU. BIOS is the latest, version: KCH7710H.86A.0095.2012.0608.1754, Release Date: 06/08/2012, factory settings.


/proc/interrups sample:
[...]
 16:        135          3          4       3837         29          1          2      23840   IO-APIC-fasteoi   ehci_hcd:usb1, sata_promise
 18:          0          0          0          0          0          0          0          0   IO-APIC-fasteoi   saa7146 (0)
[...]

dmesg sample:
Jun 16 10:58:41 dh kernel: [   46.708666] saa7146: register extension 'budget_av'
Jun 16 10:58:41 dh kernel: [   46.708717] saa7146: found saa7146 @ mem ffffc900066ba000 (revision 1, irq 18) (0x1894,0x0022)
Jun 16 10:58:41 dh kernel: [   46.708718] saa7146 (0): dma buffer size 192512
Jun 16 10:58:41 dh kernel: [   46.708719] DVB: registering new adapter (KNC1 DVB-C MK3)
Jun 16 10:58:41 dh kernel: [   46.738349] saa7146: saa7146 (0) saa7146_i2c_writeout [irq]: timed out waiting for end of xfer
Jun 16 10:58:41 dh kernel: [   46.760261] saa7146: saa7146 (0) saa7146_i2c_writeout [irq]: timed out waiting for end of xfer
Jun 16 10:58:41 dh kernel: [   46.782207] saa7146: saa7146 (0) saa7146_i2c_writeout [irq]: timed out waiting for end of xfer
Jun 16 10:58:41 dh kernel: [   46.804148] saa7146: saa7146 (0) saa7146_i2c_writeout [irq]: timed out waiting for end of xfer
[...]


How can I help you?
Comment 23 Galen Seitz 2012-07-12 04:55:15 UTC
I'm seeing a problem too.

i7-3770, DH77KC with bios 0095, and the latest CentOS 6 kernel(2.6.32-279.1.1).

I've tried both a Symbios 53C895 SCSI controller and a National DP83815 ethernet controller.  The SCSI driver hangs for a very long time at boot.  I suspect that it is attempting to scan the bus, but the driver is not receiving interrupts.  The ethernet controller fails to get an IP with DHCP.  If I boot the kernel with noapic and irqpoll, the ethernet controller seems to function properly.  I haven't tried to confirm that the SCSI controller functions with the same kernel arguments.  In both cases the card is installed in what the bios refers to as slot 3.  No other slots are populated.

If there is anything I do to help track down this problem, please let me know.  I can capture some of the same data as Lubos and/or build a new kernel if necessary.

BTW, I'm in the Portland, Oregon area, so if there's someone working in Hillsboro that's interested in chasing this, I could make my machine available.

galen

FWIW, the bios also seems to have issues regarding memory SPD configuration.
http://communities.intel.com/message/157553
Comment 24 Lubos Dolezel 2012-07-12 06:58:49 UTC
I actually RMA'd the board, got my money back and bought an Asus Z77-based board instead.

That happened after a botched BIOS update (Intel's fault, apparently) and BIOS recovery did not work (another problem with the board). I'd probably recommend you to get rid of it, too.

Someone at Intel really messed up!
Comment 25 Feng Tang 2012-07-13 04:21:47 UTC
(In reply to comment #24)
> I actually RMA'd the board, got my money back and bought an Asus Z77-based
> board instead.
> 

I've been trying to reproduce the bug here, and can't find a Intel H77 board
but a similar Asus P8Z77-V LK board, and all my PCI ethernet cards work _fine_
on  that P8Z77-VLK board.

So seems the H77 board that you saw issues uses a not so good PCIE-PCI bridge which cause this problem.
Comment 26 Galen Seitz 2012-07-13 06:16:17 UTC
(In reply to comment #25)
> (In reply to comment #24)
> > I actually RMA'd the board, got my money back and bought an Asus Z77-based
> > board instead.
> > 
> 
> I've been trying to reproduce the bug here, and can't find a Intel H77 board
> but a similar Asus P8Z77-V LK board, and all my PCI ethernet cards work
> _fine_
> on  that P8Z77-VLK board.
> 
> So seems the H77 board that you saw issues uses a not so good PCIE-PCI bridge
> which cause this problem.

Based on the photo on Newegg, it appears your ASUS motherboard uses an ASMedia ASM1083 PCIe-PCI bridge.  Given the following LKML thread, it seems that you are lucky it works at all.
https://lkml.org/lkml/2012/1/30/216

Anyway, the DH77KC uses an ITE IT8892E PCIe-PCI bridge.  This chip seems to be used on some Gigabyte boards, as well as the older Intel DH67BL.  The only bug discussion I can find regarding the IT8892E is this old thread on the CentOS mailing list.
http://lists.centos.org/pipermail/centos/2011-January/104571.html
The relative absence of bug discussions on the net leads me to think that perhaps it is just a problem in the bios on the DH77KC.  Do you think it is realistic to think that we can convince someone to fix the bios?  I would hate to spend significant time tracking this down, only to have the bios developers ignore us.  Please let me know whether you think this is worth pursuing.
Comment 27 Galen Seitz 2012-07-17 08:30:56 UTC
Created attachment 75521 [details]
acpidump of Intel dh77kc motherboard with version 0095 bios
Comment 28 Galen Seitz 2012-07-17 08:34:43 UTC
Created attachment 75531 [details]
original disassembly of DSDT table from acpidump.out
Comment 29 Galen Seitz 2012-07-17 09:01:12 UTC
Created attachment 75541 [details]
edited version of DSDT with corrected conventional PCI interrupt routing

I believe I have determined the source of the problem.  The DSDT description of the interrupt routing for the conventional PCI slots seems to be incorrect.  

Silkscreen/BIOS    orig INTA    corrected INTA
PCI 1/Slot 3           16              19
PCI 2/Slot 2           17              16
PCI 3/Slot 1           18              17

I built a 3.4.4 kernel with an assembled version of the attached dsdt.dsl file.  With this DSDT override, I was able to use my old natsemi ethernet card in each of the three PCI slots.  Also, my Symbios SCSI card no longer hangs while trying to scan the SCSI bus.

Note that iasl version 20120711-64 generates several warnings and errors when compiling the original dsdt.dsl file.  I did not attempt to correct these.  Also note that this has seen *very* limited testing.  I did try both apic and noapic, and they both seemed to work.  Finally, keep in mind that this ACPI stuff is completely new to me, so I very easily could have made errors.

galen
Comment 30 Feng Tang 2012-07-17 09:10:18 UTC
Great to know it works now :)

Is your machine using the Asus xxH77xx mother board? could you post the lspci and dmesg?

Also how do you get the correct IRQ number? just blindly try and possible combination?
Comment 31 Galen Seitz 2012-07-17 09:27:50 UTC
(In reply to comment #30)
> Great to know it works now :)
> 
> Is your machine using the Asus xxH77xx mother board? could you post the lspci
> and dmesg?

No.  As I mentioned earlier, I have an Intel DH77KC motherboard, which uses an ITE IT8892E PCIe-PCI bridge.  I'll generate an attachment with the requested info.  


> Also how do you get the correct IRQ number? just blindly try and possible
> combination?

First I mapped out the interrupt signals on the PCI slots using a multimeter.  Then I searched everywhere for a datasheet for the IT8892E, but no luck.  My main clue was that when I installed the ethernet card in slot 2, the kernel said the card was assigned to irq 17, but when I connected the ethernet cable, the kernel gave me an 'int 16: nobody cares' message.  At that point I was fairly certain it was a swizzle problem.  Then some educated guesses, along with some trial and error, and I came up with what I think is the correct routing.  The hardest part by far was trying to digest the contents of the dsdt.dsl file, although I think having the IT8892E datasheet would have saved a fair amount of trouble.  That of course assumes that the datasheet wasn't the source of the errors in the first place.

galen
Comment 32 Galen Seitz 2012-07-17 09:45:46 UTC
Created attachment 75571 [details]
dmesg, lspci -v, /proc/interrupts, and ifconfig after DSDT override on dh77kc

natsemi ethernet card installed at silkscreen PCI 3/bios slot 1.
symbios SCSI card installed at silkscreen PCI 1/bios slot 3.
Comment 33 Feng Tang 2012-07-17 13:51:05 UTC
(In reply to comment #31)

> 
> First I mapped out the interrupt signals on the PCI slots using a multimeter. 
> Then I searched everywhere for a datasheet for the IT8892E, but no luck.  My
> main clue was that when I installed the ethernet card in slot 2, the kernel
> said the card was assigned to irq 17, but when I connected the ethernet
> cable,
> the kernel gave me an 'int 16: nobody cares' message.  At that point I was
> fairly certain it was a swizzle problem.  Then some educated guesses, along
> with some trial and error, and I came up with what I think is the correct
> routing.  The hardest part by far was trying to digest the contents of the
> dsdt.dsl file, although I think having the IT8892E datasheet would have saved
> a
> fair amount of trouble.  That of course assumes that the datasheet wasn't the
> source of the errors in the first place.

Nice job!

So the wrong PRT table is the root cause. Further more, for those problemtic Asus P8Z77xxx board, people could also try to modify the PRT table like what you did to give all the IRQ numbers a drift of 1.

- Feng
Comment 34 Bjorn Helgaas 2012-07-17 19:21:03 UTC
I agree: nice job of debugging!

Lubos reported that the card works perfectly under Windows 7 on the DH77KC (comment #14).  It seems like there must be some DH77KC-specific Windows quirk, or else Windows handles _PRT differently than Linux does.  If the latter, a generic change to Linux might fix this and other issues as well.
Comment 35 Lubos Dolezel 2012-07-17 19:31:19 UTC
I think Windows handles _PRT differently, because I didn't install any drivers from Intel and the computer had no connection to the Internet. And since the mobo was designed after W7 was released, there can't possibly be any specific hack for DH77KC present in there.
Comment 36 Bjorn Helgaas 2012-07-30 17:13:11 UTC
> I was pointed to https://bugzilla.kernel.org/show_bug.cgi?id=43238.
> I tried the modifications to the DSDT that where proposed there and
> voilŕ, the 3c905c started to work :)

I don't know where to go with this.  We do have some _PRT quirks in
drivers/acpi/pci_irq.c, but since Windows 7 works fine without any
quirks, adding a quirk to Linux doesn't seem like the "right" fix.

Lubos, would it be possible for you to collect a Windows system report
using AIDA64?  I think there's a free trial version here:
http://www.aida64.com/

Maybe somebody can compare what Windows does with what Linux does and
figure out the difference.
Comment 37 Feng Tang 2012-08-01 07:52:51 UTC
(In reply to comment #36)
 
> I don't know where to go with this.  We do have some _PRT quirks in
> drivers/acpi/pci_irq.c, but since Windows 7 works fine without any
> quirks, adding a quirk to Linux doesn't seem like the "right" fix.

Maybe Win7 already know this and has a quirk for that, usually new HW will be heavily tested on Windows before going to market.

One wired thing is the lspci from Lubos and Galen Seithz are different, for those devices on the external slot, Lubos' are 00:05:0x, and the other are 00:03.0x, which makes quirk hard to be made.

Galen and Lubos, could you provide you dmeicode info? thanks
Comment 38 Bjorn Helgaas 2012-08-01 13:54:05 UTC
> Maybe Win7 already know this and has a quirk for that, usually new HW will be
> heavily tested on Windows before going to market.

In comment #35, Lubos points out that it's unlikely Windows could have
a quirk because Win7 was released before the motherboard and he
installed no Intel drivers.
Comment 39 Lubos Dolezel 2012-08-01 13:59:25 UTC
Yeah, I doubt it's a quirk. By looking at the interrupt table on Win7, I think Windows handles IRQ completely differently. Maybe it doesn't look at _PRT at all, it just thinks "hey, there's a IT8892E, so let's do build the interrupts table this way" - if that's possible.

But as I mentioned earlier, I RMA'd the mobo and got my money back. So I can't assist you any further with any sorts of dumps. I can only tell you that ASUS Z77-based mobos seem to work fine under Linux :-)
Comment 40 Alan 2012-09-06 11:42:03 UTC
Oh well, thansk for the digging

Closing
Comment 41 Galen Seitz 2012-09-29 05:40:03 UTC
After installing Intel's latest BIOS update(version 0100, 9/6/2012) for the DH77KC motherboard, the interrupt routing problem appears to be fixed.  With the updated BIOS, the interrupts are now assigned in the same way as my earlier 'corrected INTA' assignments.

Silkscreen/BIOS        INTA
PCI 1/Slot 3            19
PCI 2/Slot 2            16
PCI 3/Slot 1            17

BTW, it looks like slot 2 is sharing irq 16 with one of the usb controllers, so it might be preferable to use slots 1 and 3 before slot 2.

It was frustrating to wait so long, but happy that Intel fixed the problem.  They also fixed the memory configuration problem, as well as an EFI problem.  I've finally got CentOS 6.3 up and running.
Comment 42 Pallai Roland 2012-09-29 13:03:05 UTC
Galen, thanks for the report. The BIOS upgrade fixed my problem too.
Comment 43 Peter Ludwig 2014-08-16 18:03:39 UTC
For me even the newest BIOS did not change too much. I had to put my ethernet card into slot 3 (the Slot close to the edge) and my additional FireWire card into Slot No 1. (Leave Slot 2 free!)

As soon as I put the ethernet card into Slot No 2 and the FW card into Slot No 3 I can control the download speed with my mouse. ^^

But thanks for the Tip.