Bug 6419

Summary: VIA: irq 201: nobody cared on ethernet via_rhine - VIA PT894/VT8237 and Fast clock issues and Lost timer tick(s)!
Product: ACPI Reporter: Sérgio M Basto (sergio)
Component: Config-InterruptsAssignee: Len Brown (lenb)
Status: CLOSED CODE_FIX    
Severity: normal CC: acpi-bugzilla, andy, rl
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.18 vanila Subsystem:
Regression: --- Bisected commit-id:
Attachments: dmesg
lspci -vvv
cat /proc/interrupts
2.6.17-rc2-git5 boot with report_lost_ticks and notsc
dmesg | grep -i lost without notsc after few minuts of uptime
dmesg of 2.6.16-1.2195_FC6.root is based on kernel-2.6.17-rc3-git11
Linux 2.6.16-1.2202 x86_64 dmesg
Now with kernel x86_64 based on 2.6.18-rc1-git4 ,
pci quirk via irq behaviour change V3 http://lkml.org/lkml/diff/2006/9/7/235/1
http://lkml.org/lkml/diff/2006/9/7/235/1 pci quirk via irq behaviour change V3
dmesg from the .13 kernel.
dmesg from 2.6.17.13
dmesg, interrupts and lspci from the 17.13 kernel
usual info from vanilla 2.6.18 (still broken)
usual info from vanilla 2.6.18 + via quirk patch (works)
usual info and more, 2.6.18 with via patch (still broken)
my xorg.conf
xorg.log with DRI enabled
xorg.log without DRI enabled
cat /proc/interrupts of kernel 2.6.18 on x86_64
dmesg for 2.6.19-RC4 W/O notsc
and /proc/interrrupts
dmesg for 2.6.19-RC4 W notsc
and /proc/interrupts
list of interrupts on windows XP
dmesg kernel 2.6.19-RC4-mm2 x86_64 boot only with report_lost_ticks
dmesg kernel 2.6.19-RC4-mm2 x86_64 boot with notsc and report_lost_ticks
cat /proc/interrupts for last dmesg
2.6.19-RC5-mm1 x86_64 boot only with report_lost_ticks
dmesg kernel 2.6.19-RC5-mm1 x86_64 boot with notsc and report_lost_ticks
2.6.18-3.rt10 only with report_lost_ticks initcall_debug
same 2.6.18-3.rt10 but with a long oops
2.6.20-rc3.1.rt0.0066 #0 SMP PREEMPT
acpidump
lspci -vvv on 2.6.18
2.6.21-rc5 patch to remove irq compression

Description Sérgio M Basto 2006-04-20 16:23:03 UTC
Most recent kernel where this bug did not occur: 2.6.16-1.2080_FC5.x86_64
Distribution: FC5
Hardware Environment: Intel(R) Pentium(R) D CPU 2.80GHz (64bits)
Software Environment: 
Problem Description: nic goes down
Steps to reproduce: after a while of computer idle 
 
irq 18: nobody cared (try booting with the "irqpoll" option)

Call Trace: <IRQ> <ffffffff8015a8c4>{__report_bad_irq+48}
       <ffffffff8015aac2>{note_interrupt+433} <ffffffff8015a388>{__do_IRQ+191}
       <ffffffff8010ce0d>{do_IRQ+59} <ffffffff8010ad72>{ret_from_intr+0}
       <ffffffff801377f6>{__do_softirq+74} <ffffffff8010bc36>{call_softirq+30}
       <ffffffff8010cb74>{do_softirq+44} <ffffffff8010ce12>{do_IRQ+64}
       <ffffffff8010989f>{mwait_idle+0} <ffffffff8010ad72>{ret_from_intr+0} <EOI>
       <ffffffff8010989f>{mwait_idle+0} <ffffffff80339b56>{thread_return+0}
       <ffffffff801098d5>{mwait_idle+54} <ffffffff8010987c>{cpu_idle+151}
       <ffffffff8053a81b>{start_kernel+470} <ffffffff8053a298>{_sinittext+664}
handlers:
[<ffffffff880afb53>] (rhine_interrupt+0x0/0xb0e [via_rhine])
Disabling IRQ #18
Comment 1 Sérgio M Basto 2006-04-20 16:25:15 UTC
Created attachment 7925 [details]
dmesg
Comment 2 Sérgio M Basto 2006-04-20 16:27:13 UTC
Created attachment 7926 [details]
lspci -vvv
Comment 3 Sérgio M Basto 2006-04-20 16:28:49 UTC
Created attachment 7927 [details]
cat /proc/interrupts
Comment 4 Sérgio M Basto 2006-04-20 16:38:22 UTC
I had try the sugetion of dmesg, booting with the "irqpoll" option, but system
locks up after booting, load X and type some letters.
I had googling a bit to see if I find any similar case, I found cases in all
distros debian, gentoo, suse, redhat etc but none where in acpi ML 

thanks in advance 
Comment 5 Shaohua 2006-04-20 19:29:47 UTC
>PCI: Via IRQ fixup for 0000:00:12.0, from 5 to 2
Can you try a latest base kernel? This might be a via irq quirk issue fixed in 
base kernel.
Comment 6 Sérgio M Basto 2006-04-21 15:36:25 UTC
ifconfig eth0 down
rmmod via_rhine

Well, for now, this had resolved the ooops, can be a specific problem of
via_rhine kernel module  
Comment 7 Sérgio M Basto 2006-04-22 18:15:10 UTC
After boot at Apr 22 17:42:17 localhost kernel:
and remove via_rhine module at : Apr 22 23:15:00 localhost kernel: warning: many
lost ticks.
Apr 22 23:15:00 localhost kernel: Your time source seems to be instable or some
driver is hogging interupts
Apr 22 23:15:00 localhost kernel: rip mwait_idle+0x36/0x4a

any clue ?
Comment 8 Sérgio M Basto 2006-04-24 06:13:39 UTC
I tried 2.6.17-rc2-git3 and problem with irq nobody care seems resolved.

But still have the Fast Clock issues, on My Intel DUAL (64 bits) on Via Motherboard.
Boot with no_timer_check, seems that stabilize the machine, the clock and no
longer appears messages like "many lost ticks".
Comment 9 Sérgio M Basto 2006-04-24 18:05:49 UTC
Hi,
I had update my kernel to 2.6.16-1.2153 which is the same to say
kernel-2.6.17-rc2-git5

but, the most important, I found a better parameter on boot kernel that seems
give a great stability and the parameter is

notsc
Comment 10 Sérgio M Basto 2006-04-24 18:09:21 UTC
Created attachment 7945 [details]
2.6.17-rc2-git5 boot with report_lost_ticks and notsc
Comment 11 Len Brown 2006-04-26 19:08:38 UTC
The IRQ re-naming code has made what VIA thinks is IRQ23 into IRQ18:

via-rhine.c:v1.10-LK1.2.0-2.6 June-10-2004 Written by Donald Becker
GSI 18 sharing vector 0xC9 and IRQ 18
ACPI: PCI Interrupt 0000:00:12.0[A] -> GSI 23 (level, low) -> IRQ 201
PCI: Via IRQ fixup for 0000:00:12.0, from 5 to 9
eth0: VIA Rhine II at 0xf7fffc00, 00:13:8f:6e:8f:c5, IRQ 201.
eth0: MII PHY found at address 1, status 0x786d advertising 05e1 Link 0021.

This might be confusing the VIA quirk on the failing kernel,
and perhaps that quirk was fixed on the working kernels:

> I had update my kernel to 2.6.16-1.2153 which is the same to say
> kernel-2.6.17-rc2-git5

So both of these work properly with no cmdline parameters?

If yes, why do you need "notsc", and what bad things happen
when you don't use it?

Comment 12 Sérgio M Basto 2006-04-27 05:05:26 UTC
>> I had update my kernel to 2.6.16-1.2153 which is the same to say
>> kernel-2.6.17-rc2-git5

>So both of these work properly with no cmdline parameters?

yes, RedHat kernel are very close to the base kernel, it just a way for
compiling kernel without many troubles. 

>If yes, why do you need "notsc", and what bad things happen
>when you don't use it?

The notsc did the trick, the bad things without notsc, problems like lost
tickets and Fast Clock issues. 
Other related problem was the keyboard that sometimes when I press a key appears
3 or 4 times the same character which was very annoying. 

< Kernel command line: ro root=LABEL=/1
---
> Kernel command line: ro root=LABEL=/1 report_lost_ticks notsc
53,55c52,55
< PID hash table entries: 4096 (order: 12, 131072 bytes)
< time.c: Using 3.579545 MHz WALL PM GTOD PIT/TSC timer.
< time.c: Detected 2793.150 MHz processor.
---
> PID hash table entries: 4096 (order: 12, 32768 bytes)
> Disabling vsyscall due to use of PM timer
> time.c: Using 3.579545 MHz WALL PM GTOD PM timer.
> time.c: Detected 2793.051 MHz processor.

notsc change something with timer !

Thanks

Comment 13 Sérgio M Basto 2006-05-03 17:49:47 UTC
>> I had update my kernel to 2.6.16-1.2153 which is the same to say
>> kernel-2.6.17-rc2-git5

>So both of these work properly with no cmdline parameters?
sorry,  kernel-2.6.16-1.2153 and kernel-2.6.17-rc2-git5, for me, they are
considered has the same kernel.
if they work well without boot options ? no they don't work well , they need "notsc"
Comment 14 Sérgio M Basto 2006-05-03 18:01:30 UTC
Created attachment 8022 [details]
dmesg | grep -i lost without notsc after few minuts of uptime 

without notsc in boot options,	after few minutes of uptime
Comment 15 Sérgio M Basto 2006-05-08 16:41:17 UTC
Well 2.6.17-rc3-git11 with nostc  computer  seems stable, after this weekend of
tests 
I like to point out some patch that enter in kernel:

http://lkml.org/lkml/2006/4/19/16 (just enter in gits after rc3 and works great
for my VIA8237)
http://lkml.org/lkml/2005/8/13/30 second part of this pacth, is obsolete by the
first
http://lkml.org/lkml/2006/3/11/83 this patch give me: PCI: Unexpected Value in
PCI-Register : no Change! so messages should be more nicer 

http://lkml.org/lkml/2004/11/16/19 this one give me: sata_via 0000:00:0f.0:
routed to hard irq line 11, sincethe rest is  dev_printk(KERN_DEBUG I think I
don't see if any quirk is goning or not 
Comment 16 Sérgio M Basto 2006-05-09 16:37:14 UTC
Created attachment 8075 [details]
dmesg of 2.6.16-1.2195_FC6.root is based on kernel-2.6.17-rc3-git11

Well, one more day and still not find any problem !
Comment 17 Sérgio M Basto 2006-05-18 17:01:50 UTC
Created attachment 8143 [details]
Linux 2.6.16-1.2202 x86_64 dmesg 

well I found a problem 
after install the nvidia kernel modules closed source, I got again the same
problem on the ethernet
irq 201: nobody cared (try booting with the "irqpoll" option)
Call Trace: <IRQ> <ffffffff802aee6e>{__report_bad_irq+48}
       <ffffffff802af06c>{note_interrupt+433} <ffffffff802ae985>{__do_IRQ+189}
       <ffffffff8026e086>{do_IRQ+60} <ffffffff80259fff>{mwait_idle+0}
       <ffffffff80260252>{ret_from_intr+0} <EOI>
<ffffffff80259fff>{mwait_idle+0}
       <ffffffff80264983>{thread_return+0} <ffffffff8025a035>{mwait_idle+54}
       <ffffffff8024b9f0>{cpu_idle+151} <ffffffff806b3825>{start_kernel+502}
       <ffffffff806b3298>{_sinittext+664}
handlers:
[<ffffffff8817cb08>] (rhine_interrupt+0x0/0xae2 [via_rhine])
Disabling IRQ #201
Comment 18 Sérgio M Basto 2006-05-18 17:09:25 UTC
The problem seems that just stop ethernet until reboot network, the others
things seems work good.
Comment 19 Sérgio M Basto 2006-07-13 17:03:05 UTC
Created attachment 8543 [details]
Now with kernel x86_64 based on 2.6.18-rc1-git4 ,

ends with :

uhci_hcd 0000:00:10.1: host controller process error, something bad happened!
uhci_hcd 0000:00:10.1: host controller halted, very bad!
uhci_hcd 0000:00:10.1: HC died; cleaning up
usb 2-2: USB disconnect, address 2
PM: Removing info for No Bus:usbdev2.2_ep85
eth1: unregister 'cdc_ether' usb-0000:00:10.1-2, CDC Ethernet Device
PM: Removing info for usb:2-2:1.0
PM: Removing info for No Bus:usbdev2.2_ep81
PM: Removing info for No Bus:usbdev2.2_ep02
PM: Removing info for usb:2-2:1.1
PM: Removing info for No Bus:usbdev2.2
PM: Removing info for No Bus:usbdev2.2_ep00
PM: Removing info for usb:2-2
Comment 20 Sérgio M Basto 2006-07-13 17:18:49 UTC
In reply of my last comment : 
With 
rmmod uhci_hcd
and 
modprobe uhci_hcd

I can get network again 
Comment 21 Sérgio M Basto 2006-08-07 13:51:40 UTC
kernel 2.6.18-rc4 resolve the usb problem on #19
Comment 22 Sérgio M Basto 2006-09-06 18:49:42 UTC
Now, I Just have a interrupt problem when I use nvidia close source driver, with
open source nvidia drive, computer works perfectly, which make me think that is
a problem with nvidia guys.
Comment 23 Jim Bray 2006-09-13 15:07:06 UTC
  This looks like it is probably at root the same bug I'm seeing on my
Averatec laptop (which according to lspci uses mostly Via chips). For me it
started somewhere in the latter Ubuntu and Debian versions of the 2.6.15
kernels and has continued up to the most current 2.17 versions. With me,
ndiswrapper is given IRQ 11, but as soon as X starts up (using the
Via Unichrome driver) I get the 'irq 11: nobody cared' message. Workaround
is acpi=noirq. Please let me know if I should submit elsewhere, add more info,
etc.
Comment 24 Sérgio M Basto 2006-09-15 05:02:55 UTC
in reply of Comment #23 
please attach de usual things 
dmesg, lspci -vvv and cat /proc/interrupts
Comment 25 Sérgio M Basto 2006-09-17 09:39:17 UTC
Created attachment 9037 [details]
pci quirk via irq behaviour change V3 http://lkml.org/lkml/diff/2006/9/7/235/1

My 2 computers work better and stay more stable with this patch. I believe that
is need it, to computers works correctly.
Comment 26 Sérgio M Basto 2006-09-17 09:41:42 UTC
Comment on attachment 9037 [details]
pci quirk via irq behaviour change V3 http://lkml.org/lkml/diff/2006/9/7/235/1

===================================================================
--- linux.orig/drivers/pci/quirks.c
+++ linux/drivers/pci/quirks.c
@@ -650,11 +650,43 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_V
  * Some of the on-chip devices are actually '586 devices' so they are
  * listed here.
  */
+
+static int via_irq_fixup_needed = -1;
+
+/*
+ * As some VIA hardware is available in PCI-card form, we need to restrict
+ * this quirk to VIA PCI hardware built onto VIA-based motherboards only.
+ * We try to locate a VIA southbridge before deciding whether the quirk
+ * should be applied.
+ */
+static const struct pci_device_id via_irq_fixup_tbl[] = {
+	{
+		.vendor 	= PCI_VENDOR_ID_VIA,
+		.device 	= PCI_ANY_ID,
+		.subvendor	= PCI_ANY_ID,
+		.subdevice	= PCI_ANY_ID,
+		.class		= PCI_CLASS_BRIDGE_ISA << 8,
+		.class_mask	= 0xffff00,
+	},
+	{ 0, },
+};
+
 static void quirk_via_irq(struct pci_dev *dev)
 {
	u8 irq, new_irq;

-	new_irq = dev->irq & 0xf;
+	if (via_irq_fixup_needed == -1)
+		via_irq_fixup_needed = pci_dev_present(via_irq_fixup_tbl);
+
+	if (!via_irq_fixup_needed)
+		return;
+
+	new_irq = dev->irq;
+
+	/* Don't quirk interrupts outside the legacy IRQ range */
+	if (!new_irq || new_irq > 15)
+		return;
+
	pci_read_config_byte(dev, PCI_INTERRUPT_LINE, &irq);
	if (new_irq != irq) {
		printk(KERN_INFO "PCI: VIA IRQ fixup for %s, from %d to %d\n",
@@ -663,13 +695,7 @@ static void quirk_via_irq(struct pci_dev
		pci_write_config_byte(dev, PCI_INTERRUPT_LINE, new_irq);
	}
 }
-DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_VIA, PCI_DEVICE_ID_VIA_82C586_0,
quirk_via_irq);
-DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_VIA, PCI_DEVICE_ID_VIA_82C586_1,
quirk_via_irq);
-DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_VIA, PCI_DEVICE_ID_VIA_82C586_2,
quirk_via_irq);
-DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_VIA, PCI_DEVICE_ID_VIA_82C586_3,
quirk_via_irq);
-DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_VIA, PCI_DEVICE_ID_VIA_82C686,
quirk_via_irq);
-DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_VIA, PCI_DEVICE_ID_VIA_82C686_4,
quirk_via_irq);
-DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_VIA, PCI_DEVICE_ID_VIA_82C686_5,
quirk_via_irq);
+DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_VIA, PCI_ANY_ID, quirk_via_irq);

 /*
  * VIA VT82C598 has its device ID settable and many BIOSes
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at	http://vger.kernel.org/majordomo-info.html
Please read the FAQ at	http://www.tux.org/lkml/
Comment 27 Sérgio M Basto 2006-09-17 09:48:59 UTC
Created attachment 9038 [details]
http://lkml.org/lkml/diff/2006/9/7/235/1 pci quirk via irq behaviour change V3

please ignore comments #25 and #26
My 2 computers work better and stay more stable with this patch. I believe that

is need it, to computers works correctly.
Comment 28 Jim Bray 2006-09-19 15:28:46 UTC
 I built a vanilla 2.6.17.13, and the problem persists there. I'll attach
more info (sorry, forgot the -vvv with that kernel, but will add with the
2.6.15 currently running).
Comment 29 Jim Bray 2006-09-19 15:31:22 UTC
Created attachment 9049 [details]
dmesg from the .13 kernel.
Comment 30 Jim Bray 2006-09-19 15:36:07 UTC
Created attachment 9050 [details]
dmesg from 2.6.17.13
Comment 31 Jim Bray 2006-09-19 15:42:07 UTC
Created attachment 9051 [details]
dmesg, interrupts and lspci from the 17.13 kernel 

 I combined all the info for convenience, separated by '*******' comments.
Comment 32 Sérgio M Basto 2006-09-19 16:53:24 UTC
jim ,
1 - text/plan => text/plain

2 - why you try enable lapic ? , don't try it ! please try remove lapic option
because interrupts still without apic ( I think ) interrupts are in XT_PIC 

3 - Because interrupts are in XT_PIC and some VIA-PCI aren't quirked you should
try http://lkml.org/lkml/diff/2006/9/7/235/1 (pci quirk via irq behaviour change V3)
Comment 33 Jim Bray 2006-09-20 16:07:39 UTC
> 1 - text/plan => text/plain

  Yeah, right.

> 2 - why you try enable lapic ? 

  enabling lapic makes no difference in this case. As for why I enable it,
because it is supposed to be *Better* in some way I know little about.

> try http://lkml.org/lkml/diff/2006/9/7/235/1 (pci quirk via irq behaviour
change V3)
  I've just pulled 2.6.18, will try that, and will consider the patch if the
bug persists.

Comment 34 Sérgio M Basto 2006-09-20 16:21:35 UTC
> > 2 - why you try enable lapic ? 

>  enabling lapic makes no difference in this case. As for why I enable it,
> because it is supposed to be *Better* in some way I know little about.

but isn',t this machines work without lapic (like BIOS says) 
I had one laptop that have problems because this was the default behavior 
http://www.pps.jussieu.fr/%7Ejch/software/presario/
http://sergiomb.no-ip.org/laptop/index.html

After many investigation we think that lapic(s) isn't programmed  at all
Comment 35 Jim Bray 2006-09-21 10:06:19 UTC
 This problem is not fixed (for me) with 2.6.18. I applied the recommended
via-quirks patch (didn't patch clean, had to edit the second chunk in), and
the problem appears to be solved. I will attach the usual info from before and
after the patch.
Comment 36 Jim Bray 2006-09-21 10:08:42 UTC
Created attachment 9065 [details]
usual info from vanilla 2.6.18 (still broken)
Comment 37 Jim Bray 2006-09-21 10:10:01 UTC
Created attachment 9066 [details]
usual info from vanilla 2.6.18 + via quirk patch (works)
Comment 38 Sérgio M Basto 2006-09-21 10:44:16 UTC
Jim, so you need the patch
please boot (with patch) and without any paramenter (lapic acpi=noirq)
and report the results :

I SHOULDN'T use lapic !!
Comment 39 Jim Bray 2006-09-21 11:42:51 UTC
Created attachment 9067 [details]
usual info and more, 2.6.18 with via patch (still broken)

 Doh! Been using make-kpkg and update-grub too long, out of practice with
vanilla
kernels. I carefully rebuilt and reinstalled 2.6.18, with the patch (I have
included pci/quirks.c for verification). Since the bug triggers for me when
X starts up, which starts agpgart, I included lspci and interrupts from both
before and after this point.
Comment 40 Sérgio M Basto 2006-09-21 12:20:05 UTC
funny you have via_rhine II and ooops exactly with 200000 interrupts  
I begging to suspect the problem is with via_rhine drive 
thanks 
Comment 41 Sérgio M Basto 2006-09-22 06:24:55 UTC
OK  I need other report vi /etc/X11/xorg.conf
Please comment    # Load  "dri" 
and see if boot with X without problems, let see if it is a problem with via_agp 
Comment 42 Jim Bray 2006-09-22 09:59:51 UTC
Created attachment 9071 [details]
my xorg.conf

  Yes, commenting out DRI stops the problem. I've attached my xorg.conf.
I'm using the via driver. I tried commenting out the EnableAGPDMA option for
that driver, but that had no effect. I'll also attach the X log files with
and without DRI, which might be useful.
Comment 43 Jim Bray 2006-09-22 10:02:46 UTC
Created attachment 9072 [details]
xorg.log with DRI enabled
Comment 44 Jim Bray 2006-09-22 10:03:59 UTC
Created attachment 9073 [details]
xorg.log without DRI enabled
Comment 45 Jim Bray 2006-09-22 18:00:10 UTC
  I managed to get bcm43xx working, but the problem not only persists but in a
worse way. With DRI enabled, X hangs up hard enough to require a hard reboot.
So it appears that any network device using IRQ11 (which seems to be where all
my network devices end up), be it via-rhine, ndiswrapper loading the Micro$oft
Broadcomm driver, or bcm43xx, combines very badly with DRI unless ACPI=noirq
is specified. I checked this with 2.6.18 both with and without the via-quirks patch.
Comment 46 Sérgio M Basto 2006-10-03 20:13:42 UTC
in reply of #45, so for you with ACPI=noirq, you can work with all hardware ?
Comment 47 Sérgio M Basto 2006-10-31 15:53:48 UTC
Created attachment 9384 [details]
cat /proc/interrupts of kernel 2.6.18 on x86_64
Comment 48 Sérgio M Basto 2006-10-31 17:15:33 UTC
Created attachment 9385 [details]
dmesg for 2.6.19-RC4 W/O notsc
Comment 49 Sérgio M Basto 2006-10-31 17:17:03 UTC
Created attachment 9386 [details]
and /proc/interrrupts
Comment 50 Sérgio M Basto 2006-10-31 17:18:51 UTC
Created attachment 9387 [details]
dmesg for 2.6.19-RC4 W notsc

works better
Comment 51 Sérgio M Basto 2006-10-31 17:23:36 UTC
Created attachment 9388 [details]
and /proc/interrupts
Comment 52 Sérgio M Basto 2006-10-31 17:25:17 UTC
Created attachment 9389 [details]
list of interrupts on windows XP

May help on someting knows how Windows map interrupts
Comment 53 Jim Bray 2006-11-01 17:21:00 UTC
 Possibly-related observation: I put Debian on an old Toshiba 2060CDS, and
get a 'nobody cared interrupt disabled on IRQ 11' unless I use
APCI=force on that thing. By default the kernel switches off the ACPI and
proceeds to bungle interrupts. Debian source version 2.6.18.
Comment 54 Sérgio M Basto 2006-11-07 15:59:58 UTC
Created attachment 9429 [details]
dmesg kernel 2.6.19-RC4-mm2 x86_64  boot only with report_lost_ticks 

I choose this kernel because have include the newest hrtimers, have a very log
oops!, Now don't hang on boot but computers hangs after some minutes of uptime
like does in previous kernels without notsc boot option
Comment 55 Sérgio M Basto 2006-11-07 16:06:56 UTC
Created attachment 9430 [details]
dmesg kernel 2.6.19-RC4-mm2 x86_64  boot with notsc and report_lost_ticks

Also have a oops which I can reproduce when I do service network restart , (i
think unload e load usd-net.ko and eth0), the computer don't hang (at least
easily)
but no other clocksource than jiffies 

cat /sys/devices/system/clocksource/clocksource0/available_clocksource
jiffies
Comment 56 Sérgio M Basto 2006-11-07 16:08:11 UTC
Created attachment 9431 [details]
cat /proc/interrupts for last dmesg
Comment 57 Sérgio M Basto 2006-11-08 17:01:15 UTC
Created attachment 9434 [details]
2.6.19-RC5-mm1 x86_64 boot only with report_lost_ticks
Comment 58 Sérgio M Basto 2006-11-08 17:02:28 UTC
Created attachment 9435 [details]
dmesg kernel 2.6.19-RC5-mm1 x86_64 boot with notsc and report_lost_ticks
Comment 59 Sérgio M Basto 2006-11-29 17:50:45 UTC
Created attachment 9670 [details]
2.6.18-3.rt10 only with report_lost_ticks initcall_debug

clean boot...
Comment 60 Sérgio M Basto 2006-11-29 17:52:41 UTC
Created attachment 9671 [details]
same 2.6.18-3.rt10 but with a long oops
Comment 61 Sérgio M Basto 2006-12-12 20:55:52 UTC
Today I just found that could be just a problem with via-rhine II 
I got exactly the same problem describe on 
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=245398;msg=107
and is mention ton http://bugme.osdl.org/show_bug.cgi?id=2119
Comment 62 Andrew Haynes 2006-12-20 02:20:36 UTC
Chaps,

I am experiencing the same problem as the original poster (after a random idle
period - IRQ 193 nobody cared) -- Then network gives up.

Hardware --
ASUS P5VDC-X (Via Rhine II onboard NIC)
Pentuim 4 930D (dual core, 64bit) 
VIA vt8237a chipset
openSUSE 10.2 (but also the same problem on FC6)
SUSE Kernel 2.6.18.2-34-default
NVIDIA GeForce 6500 pci express (NVIDIA driver used)

If I can help with some testing or adding any further information (dmesg etc)
then let me know. Just a thought; could this be power saving related?
Comment 63 Sérgio M Basto 2006-12-20 09:21:40 UTC
yes , put here dmesg and cat /proc/interrupts (in attach) please , after have
the oops
Comment 64 Sérgio M Basto 2007-01-08 17:58:38 UTC
Created attachment 10037 [details]
2.6.20-rc3.1.rt0.0066 #0 SMP PREEMPT

20-rc3-rt, not new rc4 , have a funny thing don't hang on boot but but,
sometimes I have to wait about 5 minutes to boot. Because appears on oops that
could be useful to you for debug. 
Less this issue, works
Comment 65 Sérgio M Basto 2007-02-05 16:49:36 UTC
Created attachment 10305 [details]
acpidump 

acpidump >& acpidump.txt   

else stderr show me this message:  "Wrong checksum for generic table!"
Comment 66 Thibault North 2007-02-12 01:41:00 UTC
Created attachment 10389 [details]
lspci -vvv on 2.6.18

Similar problem with via_rhine : on boot, I have the message: "link is not
ready".
dmesg says:
eth0: VIA Rhine II at 0x1d000, 00:18:f3:b5:b7:75, IRQ 233.
eth0: MII PHY found at address 1, status 0x7849 advertising 01e1 Link 0000.

I tried to boot with apic=noirq, irqpoll and lapic, but no changes.
See attached lspci.
Comment 67 Sérgio M Basto 2007-03-06 18:38:40 UTC
After many hours of stressing network I could reproduce once 
NETDEV WATCHDOG: eth0: transmit timed out
eth0: Transmit timed out, status 0000, PHY status 786d, resetting...
eth0: link up, 100Mbps, full-duplex, lpa 0x45E1

if restart network and remove eth0 modules I could re-enable network and keep on.
But if I left the computer with transmit timed outs after some minutes
computer hangs.

netstat -i also give me some 2 or 3 TX-ERR s

I have test with VIA rhine but also with one 8139too which give me the same problem 

Dirk Behme point me this patch
http://www.ussg.iu.edu/hypermail/linux/kernel/0612.1/0642.html
on this thread 
http://www.mail-archive.com/linux-rt-users@vger.kernel.org/msg00089.html
but I don't know the status of this patch. 
Comment 68 Sérgio M Basto 2007-03-07 16:17:21 UTC
ok one real message : 

NETDEV WATCHDOG: eth0: transmit timed out
eth0: Transmit timeout, status 0c 0005 c07f media 10.
eth0: Tx queue start entry 23622  dirty entry 23618.
eth0:  Tx descriptor 0 is 0008a1f9.
eth0:  Tx descriptor 1 is 0008a586.
eth0:  Tx descriptor 2 is 0008a04a. (queue head)
eth0:  Tx descriptor 3 is 0008a042.
Comment 69 Len Brown 2007-03-30 19:02:25 UTC
Created attachment 11003 [details]
2.6.21-rc5 patch to remove irq compression

please reproduce this failure with 2.6.21-rc5
and then test if the attached patch helps.
Comment 70 Sérgio M Basto 2007-04-02 16:55:29 UTC
Hi ,
I test fedora kernel 2.6.20-1.3036 which is based on 2.6.21-rc5-git4
and looks good :
cat /sys/devices/system/clocksource/clocksource0/available_clocksource
acpi_pm jiffies tsc

cat /sys/devices/system/clocksource/clocksource0/current_clocksource
acpi_pm

I had boot with report_lost_ticks initcall_debug and 
no lost tickets found 
Comment 71 Sérgio M Basto 2007-04-02 17:04:37 UTC
hum, I just use/test x86_64 arch on this computer and your patch is for i386
message says :
The same code was already removed from x86_64

btw I hadn't test your patch yet 
Comment 72 Len Brown 2007-04-02 22:00:52 UTC
re: patch in comment #69 is i386 mode only.
If you're running latest x86_64, you've already got it.

re: comment #70
So what is currently still broken in the latest kernel?
Comment 73 Sérgio M Basto 2007-04-03 04:55:56 UTC
Like I said in #70 kernel 2.6.21-rc5 looks fine. On next weeks I will make many
stress tests if it pass I will close this bug , else I will report the problems
Thanks 
Comment 74 Sérgio M Basto 2007-04-11 20:38:00 UTC
ok with this kernel 2.6.21-rc5+, I had made tests and definitly computer works
much better, I don't see ( until now ) any problems with usb2 or network.
Don't need notsc neither any others boot option .
Cool 
Comment 75 Andrew Haynes 2007-05-06 03:01:39 UTC
Hi,

Does anyone know if the fix below is related related to a bug report I filed on
the novell site? (http://bugzilla.novell.com/show_bug.cgi?id=229903)

The status of this bug has been set to resolved so I am guessing there must have
been an upstream fix that has addressed both bugs.

Andy
Comment 76 Sérgio M Basto 2007-05-14 17:06:26 UTC
kernel 2.6.21 fix this issues