Distribution: Debian testing Hardware Environment: Epox 8kra2+ with Athlon CPU (chipset: KT600). Various hardware configurations, from just a simple VGA to a lot of cards. Software Environment: Problem Description: I have several machines with this board. When I upgraded some from kernel 2.6.5 to 2.6.6, the onboard networking card (via-rhine compat) stopped working. However the via-rhine drive has not changed from 2.6.5. The problem seems to be lost interrupts. Today I added an rtl 8139 Network card to one of these machines in order to do more tests and suddenly onboard networking worked! However the TV card stopped working. On closer investigation, it turns out that before (and on all other tested computers), the onboard via-rhine has been assinged IRQ12. Now it has been assigned IRQ7. In addition, before the added network card, if APIC was activated, the kernel complained that nobody cared about IRQ12. The same happens now, but with the TV card and new network card, instead with the onboard network card: ... loop: loaded (max 8 devices) via-rhine.c:v1.10-LK1.1.19-2.5 July-12-2003 Written by Donald Becker http://www.scyld.com/network/via-rhine.html eth0: VIA VT6102 Rhine-II at 0xdc00, 00:04:61:4d:3d:24, IRQ 7. eth0: MII PHY found at address 1, status 0x786d advertising 05e1 Link 45e1. 8139too Fast Ethernet driver 0.9.27 eth1: RealTek RTL8139 at 0xf8823000, 00:08:a1:58:9c:61, IRQ 12 eth1: Identified 8139 chip type 'RTL-8100B/8139D' Universal TUN/TAP device driver 1.5 (C)1999-2002 Maxim Krasnyansky Linux video capture interface: v1.00 bttv: driver version 0.9.14 loaded bttv: using 8 buffers with 2080k (520 pages) each for capture bttv: Bt8xx card found (0). bttv0: Bt878 (rev 2) at 0000:00:0b.0, irq: 12, latency: 32, mmio: 0xeb005000 bttv0: detected: Terratec TValue (Temic PAL B/G) [card=33], PCI subsystem ID is bttv0: using: Terratec TerraTValue Version Bt878 [card=33,autodetected] irq 12: nobody cared! Call Trace: [<c010610a>] __report_bad_irq+0x2a/0x90 [<c01061fc>] note_interrupt+0x6c/0xa0 [<c01064d1>] do_IRQ+0x121/0x130 [<c0104834>] common_interrupt+0x18/0x20 [<c011a870>] __do_softirq+0x30/0x80 [<c011a8e6>] do_softirq+0x26/0x30 [<c01064ad>] do_IRQ+0xfd/0x130 [<c0104834>] common_interrupt+0x18/0x20 [<c0106a0e>] setup_irq+0x7e/0xf0 [<c026bde0>] bttv_irq+0x0/0x350 [<c01065a3>] request_irq+0x83/0xd0 [<c026c771>] bttv_probe+0x351/0x6c0 [<c026bde0>] bttv_irq+0x0/0x350 [<c015cfe2>] dput+0x22/0x210 [<c020a502>] pci_device_probe_static+0x52/0x70 [<c020a55b>] __pci_device_probe+0x3b/0x50 [<c020a59c>] pci_device_probe+0x2c/0x50 [<c0246a8f>] bus_match+0x3f/0x70 [<c0246bb9>] driver_attach+0x59/0x90 [<c0246e5d>] bus_add_driver+0x8d/0xa0 [<c024729f>] driver_register+0x2f/0x40 [<c020a78c>] pci_register_driver+0x5c/0x90 [<c026cf8f>] bttv_init_module+0x9f/0x110 [<c05307bb>] do_initcalls+0x2b/0xc0 [<c0125677>] init_workqueues+0x17/0x60 [<c01002e0>] init+0x0/0x160 [<c0100315>] init+0x35/0x160 [<c010208c>] kernel_thread_helper+0x0/0x14 [<c0102091>] kernel_thread_helper+0x5/0x14 handlers: [<c026bde0>] (bttv_irq+0x0/0x350) Disabling IRQ #12 bttv0: gpio: en=00000000, out=00000000 in=00ffefff [init] ... The problem is also present without APIC support, but the device fails silently, no messages (except network errors). My conclusion is that somehow IRQ12 does not work with 2.6.6 and the KT600 mainboard. Steps to reproduce: No idea. Get a Mainboard with KT600 and try until some device gets assigned IRQ12? I am however willing to help in testing this.
guessing that you're running with ACPI enabled... Are you running 2.6.6 without the patch for bug 2665 ? can you attach the full dmesg?
O.k., reading bug 2665 gave me some more ideas to test (dmesgs of all below/attached). It seems the patch for 2665 did not made it into 2.6.7-rc1 (same problem as 2.6.6), so I tested also with 2.6.7-rc2 and got a new error pattern: 2.6.5: IRQ12 is not used, devices work 2.6.6: IRQ12 is used and does neiter work with second ethernet card (then IRQ 12 is assignet to the TV card) nor without second ethernet card (then IRQ 12 is assigned to the onboard network card). In both cases there are "IRQ 12: nobody cared" messages in the syslog and in the first case TV does not work, in the second case onboard ethernet does not work, i.e. the device on IRQ12 does fail. 2.6.6 with pic=noacpi : IRQ 12 not used, devices work 2.6.7-rc2: IRQ12 not used. With second ethernet card: TV, network, etc. work, however audio has problems (these also turn up under 2.6.5, and are likely unrelated to the main problem discussed here) Without second ethernet card: Onboard ethernet does not work, but no "IRQ xx: nobody cared" messages in the syslog. When doing a "dhclient eth0" I get the following error in the syslog: Jun 7 21:30:48 debian dhclient: Internet Software Consortium DHCP Client 2.0pl5 Jun 7 21:30:48 debian dhclient: Copyright 1995, 1996, 1997, 1998, 1999 The Internet Software Consortium. Jun 7 21:30:48 debian dhclient: All rights reserved. Jun 7 21:30:48 debian dhclient: Jun 7 21:30:48 debian dhclient: Please contribute if you find this software useful. Jun 7 21:30:48 debian dhclient: For info, please visit http://www.isc. org/dhcp-contrib.html Jun 7 21:30:48 debian dhclient: Jun 7 21:30:48 debian kernel: eth0: Promiscuous mode enabled. Jun 7 21:30:49 debian dhclient: Listening on LPF/eth0/00:04:61:4d:3d:24 Jun 7 21:30:49 debian dhclient: Sending on LPF/eth0/00:04:61:4d:3d:24 Jun 7 21:30:49 debian dhclient: Sending on Socket/fallback/fallback-net Jun 7 21:30:49 debian kernel: eth0: Promiscuous mode enabled. Jun 7 21:30:49 debian ntop[727]: **ERROR** Reading packets on device 0(eth0): 'recvfrom: Network is down' Jun 7 21:30:49 debian ntop[727]: THREADMGMT: pcap dispatch thread terminated. .. Jun 7 21:30:49 debian kernel: eth0: Promiscuous mode enabled. Jun 7 21:30:49 debian kernel: eth0: Setting full-duplex based on MII #1 link partner capability of 45e1. Jun 7 21:30:49 debian kernel: eth0: Promiscuous mode enabled. Jun 7 21:30:49 debian kernel: eth0: Promiscuous mode enabled. Jun 7 21:30:50 debian dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 3 Jun 7 21:30:50 debian dhclient: receive_packet failed on eth0: Network is down Jun 7 21:30:51 debian dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 5 Jun 7 21:30:56 debian dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 11 2.6.7-rc2 with pic=noacpi: IRQ12 not used, devices work and dhclient works. All in all I have the impression that there is a connection with bug 2665, but it does not seem to be the only problem. There were some changes in the via-rhine driver from 2.6.7-rc1 to -rc2, but they might have been cosmetic, I am not sure.
Created attachment 3106 [details] dmesg 2.6.5, no second ethernet card
Created attachment 3107 [details] dmesg 2.6.6 second ethernet card
Created attachment 3108 [details] dmesg 2.6.7-rc2 second ethernet card
Created attachment 3109 [details] dmesg 2.6.7-rc2 no second ethernet card (eth0 silent broken)
Created attachment 3110 [details] dmesg 2.6.7-rc2 no second ethernet, pic=noacpi, things work
same in 2.6.9?
Not quite: - 2.6.9 with pci=noapic works fine. - 2.6.9 without pci=noapic assigns IRQ11 to the VIA NIC and has the NIC fail silently, i.e. "link is down" on call to dhclient. I don't have the second NIC in the system anymore. 2.6.9 without pci-noapic: root ~>cat /proc/interrupts CPU0 0: 505298 XT-PIC timer 1: 1460 XT-PIC i8042 2: 0 XT-PIC cascade 7: 10980 XT-PIC bttv0 8: 4 XT-PIC rtc 9: 0 XT-PIC acpi 10: 4431 XT-PIC ide2, ide3, uhci_hcd, uhci_hcd 11: 6811 XT-PIC aic7xxx, ohci1394, ehci_hcd, uhci_hcd, uhci_hcd, ES1938, eth0 12: 87 XT-PIC i8042 14: 21593 XT-PIC ide0 15: 29 XT-PIC ide1 NMI: 0 LOC: 505048 ERR: 23 MIS: 0 2.6.9 with pic=noapic: root ~>cat /proc/interrupts CPU0 0: 379731 XT-PIC timer 1: 1258 XT-PIC i8042 2: 0 XT-PIC cascade 5: 80953 XT-PIC uhci_hcd, uhci_hcd, eth0 7: 1449 XT-PIC bttv0 8: 4 XT-PIC rtc 9: 0 XT-PIC acpi 10: 2582 XT-PIC ide2, ide3, uhci_hcd, uhci_hcd 11: 82 XT-PIC aic7xxx, ohci1394, ehci_hcd, ES1938 12: 66 XT-PIC i8042 14: 2566 XT-PIC ide0 15: 29 XT-PIC ide1 NMI: 0 LOC: 379572 ERR: 24 MIS: 0 Any tests I can run for you?
I suppose this is another VIA quirk issue (the broken device is onboard device, right?), please attach your 'lspci -vn' output, and tell me which devices is broken (its PCI id in the lspci output, such as 02:02.0).
Created attachment 4036 [details] 'lspvi -vn' 2.6.9 without pci-noapic, NIC (00:12.0) silent failure Yes, it is the onboard NIC now. But see my eralier reports also. I had a case where the definitely not onboard TV card did not work, but the onboard NIC was fine. So I think it is not the NIC, but the interrupt handling system. In my present configuration the thing that does not work unless "pci=noapic" is given is the onboard NIC, a VIA Rhine II compatible VT6103 directly mounted on the mainboard. Its PCI-ID is 00:12.0. Output from lspci -vn is below.
Don't know why the add-in TV card did not work, but please try the small patch. diff -puN drivers/pci/quirks.c~quirk-test drivers/pci/quirks.c --- 2.6/drivers/pci/quirks.c~quirk-test 2004-11-16 15:42:27.488239472 +0800 +++ 2.6-root/drivers/pci/quirks.c 2004-11-16 15:47:00.440744384 +0800 @@ -497,6 +497,7 @@ DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_V DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_VIA, PCI_DEVICE_ID_VIA_82C686_5, quirk_via_irqpic ); DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_VIA, PCI_DEVICE_ID_VIA_82C686_6, quirk_via_irqpic ); DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_VIA, PCI_DEVICE_ID_VIA_8233_5, quirk_via_irqpic ); +DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_VIA, PCI_DEVICE_ID_VIA_8233_7, quirk_via_irqpic ); /*
I would add one more. IRQ 12 is generally for i8042, so I'm a little curious ACPI assign it to a PCI device. Is there still more than one devices in IRQ 12 in latest kernel?
See /proc/interrupts listing in Comment #9: Only the i8042 keyboard controller is on IRQ 12 in 2.6.9, as it should be. I will try your patch in the evening.
The patch does not apply, there is not a single "DECLARE_PCI_FIXUP_ENABLE(...)" in drivers/pci/quirks.c of 2.6.9.. Maybe this needs 2.6.10-<something>?
Created attachment 4042 [details] patch Oh, it's my bad. Inline txt is wraped. For IRQ 12, great to know 2.6.9 do the right thing.
huh? from what i read above 2.6.9 does NOT work unless "pci=noacpi" (I assume that "pci=noapic", which doesn't do anyting, is a typo and you used "pci=noacpi")
You are correct on both counts. Of course I meant "pci=noacpi". I keep confusing the two, sorry. 2.6.9 has this silent failure without the option as described above. Side note: 2.6.10 with "pci=noacpi" after a while started producing packet loss like crazy, so I did not use it except that once. I am now back to 2.6.9, which only seem to habe a bug in the netfilter connection tracker (I assume) that I can tolerate for the moment.
same in 2.6.13?
Sorry, the board developed hardware problems and I had to replace it.