Bug 6419
Description
Sérgio M Basto
2006-04-20 16:23:03 UTC
Created attachment 7925 [details]
dmesg
Created attachment 7926 [details]
lspci -vvv
Created attachment 7927 [details]
cat /proc/interrupts
I had try the sugetion of dmesg, booting with the "irqpoll" option, but system locks up after booting, load X and type some letters. I had googling a bit to see if I find any similar case, I found cases in all distros debian, gentoo, suse, redhat etc but none where in acpi ML thanks in advance >PCI: Via IRQ fixup for 0000:00:12.0, from 5 to 2
Can you try a latest base kernel? This might be a via irq quirk issue fixed in
base kernel.
ifconfig eth0 down rmmod via_rhine Well, for now, this had resolved the ooops, can be a specific problem of via_rhine kernel module After boot at Apr 22 17:42:17 localhost kernel: and remove via_rhine module at : Apr 22 23:15:00 localhost kernel: warning: many lost ticks. Apr 22 23:15:00 localhost kernel: Your time source seems to be instable or some driver is hogging interupts Apr 22 23:15:00 localhost kernel: rip mwait_idle+0x36/0x4a any clue ? I tried 2.6.17-rc2-git3 and problem with irq nobody care seems resolved. But still have the Fast Clock issues, on My Intel DUAL (64 bits) on Via Motherboard. Boot with no_timer_check, seems that stabilize the machine, the clock and no longer appears messages like "many lost ticks". Hi, I had update my kernel to 2.6.16-1.2153 which is the same to say kernel-2.6.17-rc2-git5 but, the most important, I found a better parameter on boot kernel that seems give a great stability and the parameter is notsc Created attachment 7945 [details]
2.6.17-rc2-git5 boot with report_lost_ticks and notsc
The IRQ re-naming code has made what VIA thinks is IRQ23 into IRQ18:
via-rhine.c:v1.10-LK1.2.0-2.6 June-10-2004 Written by Donald Becker
GSI 18 sharing vector 0xC9 and IRQ 18
ACPI: PCI Interrupt 0000:00:12.0[A] -> GSI 23 (level, low) -> IRQ 201
PCI: Via IRQ fixup for 0000:00:12.0, from 5 to 9
eth0: VIA Rhine II at 0xf7fffc00, 00:13:8f:6e:8f:c5, IRQ 201.
eth0: MII PHY found at address 1, status 0x786d advertising 05e1 Link 0021.
This might be confusing the VIA quirk on the failing kernel,
and perhaps that quirk was fixed on the working kernels:
> I had update my kernel to 2.6.16-1.2153 which is the same to say
> kernel-2.6.17-rc2-git5
So both of these work properly with no cmdline parameters?
If yes, why do you need "notsc", and what bad things happen
when you don't use it?
>> I had update my kernel to 2.6.16-1.2153 which is the same to say >> kernel-2.6.17-rc2-git5 >So both of these work properly with no cmdline parameters? yes, RedHat kernel are very close to the base kernel, it just a way for compiling kernel without many troubles. >If yes, why do you need "notsc", and what bad things happen >when you don't use it? The notsc did the trick, the bad things without notsc, problems like lost tickets and Fast Clock issues. Other related problem was the keyboard that sometimes when I press a key appears 3 or 4 times the same character which was very annoying. < Kernel command line: ro root=LABEL=/1 --- > Kernel command line: ro root=LABEL=/1 report_lost_ticks notsc 53,55c52,55 < PID hash table entries: 4096 (order: 12, 131072 bytes) < time.c: Using 3.579545 MHz WALL PM GTOD PIT/TSC timer. < time.c: Detected 2793.150 MHz processor. --- > PID hash table entries: 4096 (order: 12, 32768 bytes) > Disabling vsyscall due to use of PM timer > time.c: Using 3.579545 MHz WALL PM GTOD PM timer. > time.c: Detected 2793.051 MHz processor. notsc change something with timer ! Thanks >> I had update my kernel to 2.6.16-1.2153 which is the same to say >> kernel-2.6.17-rc2-git5 >So both of these work properly with no cmdline parameters? sorry, kernel-2.6.16-1.2153 and kernel-2.6.17-rc2-git5, for me, they are considered has the same kernel. if they work well without boot options ? no they don't work well , they need "notsc" Created attachment 8022 [details]
dmesg | grep -i lost without notsc after few minuts of uptime
without notsc in boot options, after few minutes of uptime
Well 2.6.17-rc3-git11 with nostc computer seems stable, after this weekend of tests I like to point out some patch that enter in kernel: http://lkml.org/lkml/2006/4/19/16 (just enter in gits after rc3 and works great for my VIA8237) http://lkml.org/lkml/2005/8/13/30 second part of this pacth, is obsolete by the first http://lkml.org/lkml/2006/3/11/83 this patch give me: PCI: Unexpected Value in PCI-Register : no Change! so messages should be more nicer http://lkml.org/lkml/2004/11/16/19 this one give me: sata_via 0000:00:0f.0: routed to hard irq line 11, sincethe rest is dev_printk(KERN_DEBUG I think I don't see if any quirk is goning or not Created attachment 8075 [details]
dmesg of 2.6.16-1.2195_FC6.root is based on kernel-2.6.17-rc3-git11
Well, one more day and still not find any problem !
Created attachment 8143 [details]
Linux 2.6.16-1.2202 x86_64 dmesg
well I found a problem
after install the nvidia kernel modules closed source, I got again the same
problem on the ethernet
irq 201: nobody cared (try booting with the "irqpoll" option)
Call Trace: <IRQ> <ffffffff802aee6e>{__report_bad_irq+48}
<ffffffff802af06c>{note_interrupt+433} <ffffffff802ae985>{__do_IRQ+189}
<ffffffff8026e086>{do_IRQ+60} <ffffffff80259fff>{mwait_idle+0}
<ffffffff80260252>{ret_from_intr+0} <EOI>
<ffffffff80259fff>{mwait_idle+0}
<ffffffff80264983>{thread_return+0} <ffffffff8025a035>{mwait_idle+54}
<ffffffff8024b9f0>{cpu_idle+151} <ffffffff806b3825>{start_kernel+502}
<ffffffff806b3298>{_sinittext+664}
handlers:
[<ffffffff8817cb08>] (rhine_interrupt+0x0/0xae2 [via_rhine])
Disabling IRQ #201
The problem seems that just stop ethernet until reboot network, the others things seems work good. Created attachment 8543 [details]
Now with kernel x86_64 based on 2.6.18-rc1-git4 ,
ends with :
uhci_hcd 0000:00:10.1: host controller process error, something bad happened!
uhci_hcd 0000:00:10.1: host controller halted, very bad!
uhci_hcd 0000:00:10.1: HC died; cleaning up
usb 2-2: USB disconnect, address 2
PM: Removing info for No Bus:usbdev2.2_ep85
eth1: unregister 'cdc_ether' usb-0000:00:10.1-2, CDC Ethernet Device
PM: Removing info for usb:2-2:1.0
PM: Removing info for No Bus:usbdev2.2_ep81
PM: Removing info for No Bus:usbdev2.2_ep02
PM: Removing info for usb:2-2:1.1
PM: Removing info for No Bus:usbdev2.2
PM: Removing info for No Bus:usbdev2.2_ep00
PM: Removing info for usb:2-2
In reply of my last comment : With rmmod uhci_hcd and modprobe uhci_hcd I can get network again kernel 2.6.18-rc4 resolve the usb problem on #19 Now, I Just have a interrupt problem when I use nvidia close source driver, with open source nvidia drive, computer works perfectly, which make me think that is a problem with nvidia guys. This looks like it is probably at root the same bug I'm seeing on my Averatec laptop (which according to lspci uses mostly Via chips). For me it started somewhere in the latter Ubuntu and Debian versions of the 2.6.15 kernels and has continued up to the most current 2.17 versions. With me, ndiswrapper is given IRQ 11, but as soon as X starts up (using the Via Unichrome driver) I get the 'irq 11: nobody cared' message. Workaround is acpi=noirq. Please let me know if I should submit elsewhere, add more info, etc. in reply of Comment #23 please attach de usual things dmesg, lspci -vvv and cat /proc/interrupts Created attachment 9037 [details] pci quirk via irq behaviour change V3 http://lkml.org/lkml/diff/2006/9/7/235/1 My 2 computers work better and stay more stable with this patch. I believe that is need it, to computers works correctly. Comment on attachment 9037 [details] pci quirk via irq behaviour change V3 http://lkml.org/lkml/diff/2006/9/7/235/1 =================================================================== --- linux.orig/drivers/pci/quirks.c +++ linux/drivers/pci/quirks.c @@ -650,11 +650,43 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_V * Some of the on-chip devices are actually '586 devices' so they are * listed here. */ + +static int via_irq_fixup_needed = -1; + +/* + * As some VIA hardware is available in PCI-card form, we need to restrict + * this quirk to VIA PCI hardware built onto VIA-based motherboards only. + * We try to locate a VIA southbridge before deciding whether the quirk + * should be applied. + */ +static const struct pci_device_id via_irq_fixup_tbl[] = { + { + .vendor = PCI_VENDOR_ID_VIA, + .device = PCI_ANY_ID, + .subvendor = PCI_ANY_ID, + .subdevice = PCI_ANY_ID, + .class = PCI_CLASS_BRIDGE_ISA << 8, + .class_mask = 0xffff00, + }, + { 0, }, +}; + static void quirk_via_irq(struct pci_dev *dev) { u8 irq, new_irq; - new_irq = dev->irq & 0xf; + if (via_irq_fixup_needed == -1) + via_irq_fixup_needed = pci_dev_present(via_irq_fixup_tbl); + + if (!via_irq_fixup_needed) + return; + + new_irq = dev->irq; + + /* Don't quirk interrupts outside the legacy IRQ range */ + if (!new_irq || new_irq > 15) + return; + pci_read_config_byte(dev, PCI_INTERRUPT_LINE, &irq); if (new_irq != irq) { printk(KERN_INFO "PCI: VIA IRQ fixup for %s, from %d to %d\n", @@ -663,13 +695,7 @@ static void quirk_via_irq(struct pci_dev pci_write_config_byte(dev, PCI_INTERRUPT_LINE, new_irq); } } -DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_VIA, PCI_DEVICE_ID_VIA_82C586_0, quirk_via_irq); -DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_VIA, PCI_DEVICE_ID_VIA_82C586_1, quirk_via_irq); -DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_VIA, PCI_DEVICE_ID_VIA_82C586_2, quirk_via_irq); -DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_VIA, PCI_DEVICE_ID_VIA_82C586_3, quirk_via_irq); -DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_VIA, PCI_DEVICE_ID_VIA_82C686, quirk_via_irq); -DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_VIA, PCI_DEVICE_ID_VIA_82C686_4, quirk_via_irq); -DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_VIA, PCI_DEVICE_ID_VIA_82C686_5, quirk_via_irq); +DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_VIA, PCI_ANY_ID, quirk_via_irq); /* * VIA VT82C598 has its device ID settable and many BIOSes - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ Created attachment 9038 [details] http://lkml.org/lkml/diff/2006/9/7/235/1 pci quirk via irq behaviour change V3 please ignore comments #25 and #26 My 2 computers work better and stay more stable with this patch. I believe that is need it, to computers works correctly. I built a vanilla 2.6.17.13, and the problem persists there. I'll attach more info (sorry, forgot the -vvv with that kernel, but will add with the 2.6.15 currently running). Created attachment 9049 [details]
dmesg from the .13 kernel.
Created attachment 9050 [details]
dmesg from 2.6.17.13
Created attachment 9051 [details]
dmesg, interrupts and lspci from the 17.13 kernel
I combined all the info for convenience, separated by '*******' comments.
jim , 1 - text/plan => text/plain 2 - why you try enable lapic ? , don't try it ! please try remove lapic option because interrupts still without apic ( I think ) interrupts are in XT_PIC 3 - Because interrupts are in XT_PIC and some VIA-PCI aren't quirked you should try http://lkml.org/lkml/diff/2006/9/7/235/1 (pci quirk via irq behaviour change V3) > 1 - text/plan => text/plain Yeah, right. > 2 - why you try enable lapic ? enabling lapic makes no difference in this case. As for why I enable it, because it is supposed to be *Better* in some way I know little about. > try http://lkml.org/lkml/diff/2006/9/7/235/1 (pci quirk via irq behaviour change V3) I've just pulled 2.6.18, will try that, and will consider the patch if the bug persists. > > 2 - why you try enable lapic ? > enabling lapic makes no difference in this case. As for why I enable it, > because it is supposed to be *Better* in some way I know little about. but isn',t this machines work without lapic (like BIOS says) I had one laptop that have problems because this was the default behavior http://www.pps.jussieu.fr/%7Ejch/software/presario/ http://sergiomb.no-ip.org/laptop/index.html After many investigation we think that lapic(s) isn't programmed at all This problem is not fixed (for me) with 2.6.18. I applied the recommended via-quirks patch (didn't patch clean, had to edit the second chunk in), and the problem appears to be solved. I will attach the usual info from before and after the patch. Created attachment 9065 [details]
usual info from vanilla 2.6.18 (still broken)
Created attachment 9066 [details]
usual info from vanilla 2.6.18 + via quirk patch (works)
Jim, so you need the patch please boot (with patch) and without any paramenter (lapic acpi=noirq) and report the results : I SHOULDN'T use lapic !! Created attachment 9067 [details]
usual info and more, 2.6.18 with via patch (still broken)
Doh! Been using make-kpkg and update-grub too long, out of practice with
vanilla
kernels. I carefully rebuilt and reinstalled 2.6.18, with the patch (I have
included pci/quirks.c for verification). Since the bug triggers for me when
X starts up, which starts agpgart, I included lspci and interrupts from both
before and after this point.
funny you have via_rhine II and ooops exactly with 200000 interrupts I begging to suspect the problem is with via_rhine drive thanks OK I need other report vi /etc/X11/xorg.conf Please comment # Load "dri" and see if boot with X without problems, let see if it is a problem with via_agp Created attachment 9071 [details]
my xorg.conf
Yes, commenting out DRI stops the problem. I've attached my xorg.conf.
I'm using the via driver. I tried commenting out the EnableAGPDMA option for
that driver, but that had no effect. I'll also attach the X log files with
and without DRI, which might be useful.
Created attachment 9072 [details]
xorg.log with DRI enabled
Created attachment 9073 [details]
xorg.log without DRI enabled
I managed to get bcm43xx working, but the problem not only persists but in a worse way. With DRI enabled, X hangs up hard enough to require a hard reboot. So it appears that any network device using IRQ11 (which seems to be where all my network devices end up), be it via-rhine, ndiswrapper loading the Micro$oft Broadcomm driver, or bcm43xx, combines very badly with DRI unless ACPI=noirq is specified. I checked this with 2.6.18 both with and without the via-quirks patch. in reply of #45, so for you with ACPI=noirq, you can work with all hardware ? Created attachment 9384 [details]
cat /proc/interrupts of kernel 2.6.18 on x86_64
Created attachment 9385 [details]
dmesg for 2.6.19-RC4 W/O notsc
Created attachment 9386 [details]
and /proc/interrrupts
Created attachment 9387 [details]
dmesg for 2.6.19-RC4 W notsc
works better
Created attachment 9388 [details]
and /proc/interrupts
Created attachment 9389 [details]
list of interrupts on windows XP
May help on someting knows how Windows map interrupts
Possibly-related observation: I put Debian on an old Toshiba 2060CDS, and get a 'nobody cared interrupt disabled on IRQ 11' unless I use APCI=force on that thing. By default the kernel switches off the ACPI and proceeds to bungle interrupts. Debian source version 2.6.18. Created attachment 9429 [details]
dmesg kernel 2.6.19-RC4-mm2 x86_64 boot only with report_lost_ticks
I choose this kernel because have include the newest hrtimers, have a very log
oops!, Now don't hang on boot but computers hangs after some minutes of uptime
like does in previous kernels without notsc boot option
Created attachment 9430 [details]
dmesg kernel 2.6.19-RC4-mm2 x86_64 boot with notsc and report_lost_ticks
Also have a oops which I can reproduce when I do service network restart , (i
think unload e load usd-net.ko and eth0), the computer don't hang (at least
easily)
but no other clocksource than jiffies
cat /sys/devices/system/clocksource/clocksource0/available_clocksource
jiffies
Created attachment 9431 [details]
cat /proc/interrupts for last dmesg
Created attachment 9434 [details]
2.6.19-RC5-mm1 x86_64 boot only with report_lost_ticks
Created attachment 9435 [details]
dmesg kernel 2.6.19-RC5-mm1 x86_64 boot with notsc and report_lost_ticks
Created attachment 9670 [details]
2.6.18-3.rt10 only with report_lost_ticks initcall_debug
clean boot...
Created attachment 9671 [details]
same 2.6.18-3.rt10 but with a long oops
Today I just found that could be just a problem with via-rhine II I got exactly the same problem describe on http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=245398;msg=107 and is mention ton http://bugme.osdl.org/show_bug.cgi?id=2119 Chaps, I am experiencing the same problem as the original poster (after a random idle period - IRQ 193 nobody cared) -- Then network gives up. Hardware -- ASUS P5VDC-X (Via Rhine II onboard NIC) Pentuim 4 930D (dual core, 64bit) VIA vt8237a chipset openSUSE 10.2 (but also the same problem on FC6) SUSE Kernel 2.6.18.2-34-default NVIDIA GeForce 6500 pci express (NVIDIA driver used) If I can help with some testing or adding any further information (dmesg etc) then let me know. Just a thought; could this be power saving related? yes , put here dmesg and cat /proc/interrupts (in attach) please , after have the oops Created attachment 10037 [details]
2.6.20-rc3.1.rt0.0066 #0 SMP PREEMPT
20-rc3-rt, not new rc4 , have a funny thing don't hang on boot but but,
sometimes I have to wait about 5 minutes to boot. Because appears on oops that
could be useful to you for debug.
Less this issue, works
Created attachment 10305 [details]
acpidump
acpidump >& acpidump.txt
else stderr show me this message: "Wrong checksum for generic table!"
Created attachment 10389 [details]
lspci -vvv on 2.6.18
Similar problem with via_rhine : on boot, I have the message: "link is not
ready".
dmesg says:
eth0: VIA Rhine II at 0x1d000, 00:18:f3:b5:b7:75, IRQ 233.
eth0: MII PHY found at address 1, status 0x7849 advertising 01e1 Link 0000.
I tried to boot with apic=noirq, irqpoll and lapic, but no changes.
See attached lspci.
After many hours of stressing network I could reproduce once NETDEV WATCHDOG: eth0: transmit timed out eth0: Transmit timed out, status 0000, PHY status 786d, resetting... eth0: link up, 100Mbps, full-duplex, lpa 0x45E1 if restart network and remove eth0 modules I could re-enable network and keep on. But if I left the computer with transmit timed outs after some minutes computer hangs. netstat -i also give me some 2 or 3 TX-ERR s I have test with VIA rhine but also with one 8139too which give me the same problem Dirk Behme point me this patch http://www.ussg.iu.edu/hypermail/linux/kernel/0612.1/0642.html on this thread http://www.mail-archive.com/linux-rt-users@vger.kernel.org/msg00089.html but I don't know the status of this patch. ok one real message : NETDEV WATCHDOG: eth0: transmit timed out eth0: Transmit timeout, status 0c 0005 c07f media 10. eth0: Tx queue start entry 23622 dirty entry 23618. eth0: Tx descriptor 0 is 0008a1f9. eth0: Tx descriptor 1 is 0008a586. eth0: Tx descriptor 2 is 0008a04a. (queue head) eth0: Tx descriptor 3 is 0008a042. Created attachment 11003 [details]
2.6.21-rc5 patch to remove irq compression
please reproduce this failure with 2.6.21-rc5
and then test if the attached patch helps.
Hi , I test fedora kernel 2.6.20-1.3036 which is based on 2.6.21-rc5-git4 and looks good : cat /sys/devices/system/clocksource/clocksource0/available_clocksource acpi_pm jiffies tsc cat /sys/devices/system/clocksource/clocksource0/current_clocksource acpi_pm I had boot with report_lost_ticks initcall_debug and no lost tickets found hum, I just use/test x86_64 arch on this computer and your patch is for i386 message says : The same code was already removed from x86_64 btw I hadn't test your patch yet re: patch in comment #69 is i386 mode only. If you're running latest x86_64, you've already got it. re: comment #70 So what is currently still broken in the latest kernel? Like I said in #70 kernel 2.6.21-rc5 looks fine. On next weeks I will make many stress tests if it pass I will close this bug , else I will report the problems Thanks ok with this kernel 2.6.21-rc5+, I had made tests and definitly computer works much better, I don't see ( until now ) any problems with usb2 or network. Don't need notsc neither any others boot option . Cool Hi, Does anyone know if the fix below is related related to a bug report I filed on the novell site? (http://bugzilla.novell.com/show_bug.cgi?id=229903) The status of this bug has been set to resolved so I am guessing there must have been an upstream fix that has addressed both bugs. Andy kernel 2.6.21 fix this issues |