Bug 11121 (via-velocity:irq)

Summary: Kernel panic - not syncing: Fatal exeption in interrupt when velocity driver started
Product: Drivers Reporter: Markus Beck (markus)
Component: NetworkAssignee: Francois Romieu (romieu)
Status: CLOSED OBSOLETE    
Severity: blocking CC: akpm, alan, albcamus, andi-bz, bunk, lenb, markus
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.26 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: 2.6.26 Kernel Config
do_irq_debug.diff
via_velocity_registration_order.diff
cat /proc/interrupts
full dmesg output with do_irq_debug.diff patch
dmesg output with via_velocity_registration_order.diff patch
full dmesg with workaround
do_irq_debug_v2.diff
full dmesg with do_irq_debug_v2.diff patch
acpidump on kernel 2.6.26
full dmesg on kernel 2.6.25.10 (works)
acpidump on kernel 2.6.25.10 (no diff =>2.6.26)
full dmesg with do_irq_debug_v2.diff patch and enabled acpi/device kernel debug
cat /proc/interrupts with APIC
kernel config (kernel panic)
kernel config (eth0 work with irq message)

Description Markus Beck 2008-07-18 21:13:01 UTC
Latest working kernel version: 2.6.26
Earliest failing kernel version: 2.6.25.10
Distribution: Debian 4.1.1-21
Hardware Environment: Via Epia SN (VT6130 chipset)
Software Environment: gcc version 4.1.2
Problem Description:

dmesg|grep eth0

eth0: VIA Rhine II at 0xfafff400, 00:40:63:f3:7b:d7, IRQ 23.
eth0: MII PHY found at address 1, status 0x7849 advertising 05e1 Link 0000.

ifconfig eth0 up

BUG: unable to handle kernel NULL pointer dereference at 00000000
IP: [<00000000>]
*pde = 00000000
Oops: 0000 [#1]
Modules linked in:

Pid: 2646, comm: ifconfig Not tainted (2.6.26 #3)
EIP: 0060:[<00000000>] EFLAGS: 00010016 CPU: 0
EIP is at 0x0
EAX: 000000a3 EBX: 000000a3 ECX: 000023a8 EDX: c045a188
ESI: 00000000 EDI: f7d37be4 EBP: 000080d0 ESP: f760de5c
 DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
Process ifconfig (pid: 2646, ti=f760c000 task=f7d13440 task.ti=f760c000)
Stack: c01048f1 00001000 00001000 c0103a17 00001000 00000107 f7d37000 00001000
       f7d37be4 000080d0 00000000 0000007b 0000007b 00000000 ffffff5c c014c19d
       00000060 00010216 f7d32380 00000000 f7118100 f738e000 c024e171 f7d32380
Call Trace:
 [<c01048f1>] do_IRQ+0x6a/0x7d
 [<c0103a17>] common_interrupt+0x23/0x28
 [<c014c19d>] __kmalloc+0x84/0x9a
 [<c024e171>] velocity_init_td_ring+0x32/0xe1
 [<c024f38e>] velocity_open+0x179/0x21b
 [<c02db39d>] dev_open+0x63/0x93
 [<c02da339>] dev_change_flags+0x92/0x13d
 [<c0311510>] devinet_ioctl+0x230/0x528
 [<c02d2642>] sock_ioctl+0x152/0x174
 [<c02d24f0>] sock_ioctl+0x0/0x174
 [<c0157146>] vfs_ioctl+0x16/0x48
 [<c015735b>] do_vfs_ioctl+0x1e3/0x1f6
 [<c033c56a>] do_page_fault+0x262/0x574
 [<c015739a>] sys_ioctl+0x2c/0x42
 [<c0103025>] sysenter_past_esp+0x6a/0x91
 =======================
Code:  Bad EIP value.
EIP: [<00000000>] 0x0 SS:ESP 0068:f760de5c
Kernel panic - not syncing: Fatal exception in interrupt



Steps to reproduce:
ifconfig eth0 up
Comment 1 Markus Beck 2008-07-18 21:18:24 UTC
Latest working kernel version: 2.6.25.10
Earliest failing kernel version: 2.6.26
Comment 2 Andrew Morton 2008-07-19 03:06:38 UTC
Marking this as a regression.
Comment 3 Markus Beck 2008-07-19 05:16:37 UTC
Created attachment 16888 [details]
2.6.26 Kernel Config
Comment 4 Marcin Slusarz 2008-07-19 13:53:18 UTC
Created attachment 16890 [details]
do_irq_debug.diff
Comment 5 Marcin Slusarz 2008-07-19 13:53:54 UTC
Created attachment 16891 [details]
via_velocity_registration_order.diff
Comment 6 Marcin Slusarz 2008-07-19 13:54:24 UTC
is it reproducible?
does call trace looks always the same?
can you provide contents of /proc/interrupts and full dmesg output with do_irq_debug.diff patch?
does applying via_velocity_registration_order.diff change anything?
Comment 7 Markus Beck 2008-07-19 16:47:02 UTC
Created attachment 16895 [details]
cat /proc/interrupts
Comment 8 Markus Beck 2008-07-19 17:45:03 UTC
Created attachment 16896 [details]
full dmesg output with do_irq_debug.diff patch

the panic occurs when the system boots up with a down'ed interface, and when you manually try to bring it up with "ifconfig ethX up" the kernelpanic occurs
with the enable_irq32.... patch applied i cannot see changes
Comment 9 Markus Beck 2008-07-19 18:11:47 UTC
Created attachment 16897 [details]
dmesg output with via_velocity_registration_order.diff patch
Comment 10 Markus Beck 2008-07-19 18:47:55 UTC
Created attachment 16898 [details]
full dmesg with workaround

add kernel commandline "pci=noacpi"
Comment 11 Marcin Slusarz 2008-07-20 03:08:22 UTC
ok, if i understand it correctly, velocity_open -> velocity_init_rd_ring -> velocity_rx_refill -> velocity_give_many_rx_descs -> writew
enables velocity interrupt, but ACPI IRQ routing maps it to the wrong IRQ handler (struct irq_desc->handle_irq is NULL)

so someone with ACPI knowledge should look at this bug.
maybe applying do_irq_debug_v2.diff patch will help narrow it...
Comment 12 Marcin Slusarz 2008-07-20 03:08:58 UTC
Created attachment 16904 [details]
do_irq_debug_v2.diff
Comment 13 Andi Kleen 2008-07-20 03:28:54 UTC
Can you please add full boot logs for the last working kernel and for the failing
kernel and acpidmp?

Also a bisect if you have time would be useful.
Comment 14 Markus Beck 2008-07-20 05:53:31 UTC
Created attachment 16905 [details]
full dmesg with do_irq_debug_v2.diff patch
Comment 15 Markus Beck 2008-07-20 06:03:11 UTC
Created attachment 16906 [details]
acpidump on kernel 2.6.26
Comment 16 Markus Beck 2008-07-20 06:11:20 UTC
Created attachment 16907 [details]
full dmesg on kernel 2.6.25.10 (works)
Comment 17 Markus Beck 2008-07-20 06:19:25 UTC
Created attachment 16908 [details]
acpidump on kernel  2.6.25.10 (no diff =>2.6.26)
Comment 18 Markus Beck 2008-07-20 08:36:42 UTC
Created attachment 16909 [details]
full dmesg with do_irq_debug_v2.diff patch and enabled acpi/device kernel debug
Comment 19 Len Brown 2008-07-20 13:16:53 UTC
This problem goes away when the exact same kernel
is booted with "pci=noacpi"?

This is surprising, as ACPI's role here is
to return the IRQ number, and it returns 23, just
like the the working 2.6.25 and non pci=noacpi case.

I don't understand the /proc/interrupts in comment #7,
as they are in PIC mode, and the dmesg here,
even the pci=noacpi case are all in IOAPIC mode.

I too recommend a bisect.  start with drivers/net/via-velocity*
Comment 20 Markus Beck 2008-07-20 15:29:33 UTC
Created attachment 16911 [details]
cat /proc/interrupts with APIC
Comment 21 Markus Beck 2008-07-20 15:40:08 UTC
there is no problem when using "pci=noacpi" in the commandline with the exact same kernel

the interface functions also without "pci=noacpi" with the kernel version 2.6.25 (same config) 

i ve optimized the kernel config independently and noticed that the interfaces runs smoothly.
debug message ... disabled irq set...

i ve saved both config files and checked the difference
and i ve tested the most important settings but to my surprise the interface still worked
it is comprehensible that compiling on that pc is awful (1GHZ), especially with 18 complete rounds
Comment 22 Markus Beck 2008-07-20 15:44:23 UTC
Created attachment 16912 [details]
kernel config (kernel panic)
Comment 23 Markus Beck 2008-07-20 15:45:20 UTC
Created attachment 16913 [details]
kernel config (eth0 work with irq message)
Comment 24 Markus Beck 2008-07-20 23:58:06 UTC
status:

ifconfig eth0 up
irq 163, desc: c0401fe8, depth: 1, count: 0, unhandled: 0
->handle_irq():  c013010f, handle_bad_irq+0x0/0x199
->chip(): c0402d40, no_irq_chip+0x0/0x40
->action(): 00000000
  IRQ_DISABLED set
unexpected IRQ trap at vector a3
Velocity is AUTO mode
Comment 25 Jike Song 2008-07-28 10:03:39 UTC
Here you have two NIC cards, eth0 for VIA Rhine II and eth1 for VIA Velocity. And more, when doing 'ifconfig eth0 up', the eth1's open method is called?


(In reply to comment #0)
> Latest working kernel version: 2.6.26
> Earliest failing kernel version: 2.6.25.10
> Distribution: Debian 4.1.1-21
> Hardware Environment: Via Epia SN (VT6130 chipset)
> Software Environment: gcc version 4.1.2
> Problem Description:
> 
> dmesg|grep eth0
> 
> eth0: VIA Rhine II at 0xfafff400, 00:40:63:f3:7b:d7, IRQ 23.
> eth0: MII PHY found at address 1, status 0x7849 advertising 05e1 Link 0000.
> 
> ifconfig eth0 up
> 
> BUG: unable to handle kernel NULL pointer dereference at 00000000
> IP: [<00000000>]
> *pde = 00000000
> Oops: 0000 [#1]
> Modules linked in:
> 
> Pid: 2646, comm: ifconfig Not tainted (2.6.26 #3)
> EIP: 0060:[<00000000>] EFLAGS: 00010016 CPU: 0
> EIP is at 0x0
> EAX: 000000a3 EBX: 000000a3 ECX: 000023a8 EDX: c045a188
> ESI: 00000000 EDI: f7d37be4 EBP: 000080d0 ESP: f760de5c
>  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
> Process ifconfig (pid: 2646, ti=f760c000 task=f7d13440 task.ti=f760c000)
> Stack: c01048f1 00001000 00001000 c0103a17 00001000 00000107 f7d37000
> 00001000
>        f7d37be4 000080d0 00000000 0000007b 0000007b 00000000 ffffff5c
>        c014c19d
>        00000060 00010216 f7d32380 00000000 f7118100 f738e000 c024e171
>        f7d32380
> Call Trace:
>  [<c01048f1>] do_IRQ+0x6a/0x7d
>  [<c0103a17>] common_interrupt+0x23/0x28
>  [<c014c19d>] __kmalloc+0x84/0x9a
>  [<c024e171>] velocity_init_td_ring+0x32/0xe1
>  [<c024f38e>] velocity_open+0x179/0x21b
>  [<c02db39d>] dev_open+0x63/0x93
>  [<c02da339>] dev_change_flags+0x92/0x13d
>  [<c0311510>] devinet_ioctl+0x230/0x528
>  [<c02d2642>] sock_ioctl+0x152/0x174
>  [<c02d24f0>] sock_ioctl+0x0/0x174
>  [<c0157146>] vfs_ioctl+0x16/0x48
>  [<c015735b>] do_vfs_ioctl+0x1e3/0x1f6
>  [<c033c56a>] do_page_fault+0x262/0x574
>  [<c015739a>] sys_ioctl+0x2c/0x42
>  [<c0103025>] sysenter_past_esp+0x6a/0x91
>  =======================
> Code:  Bad EIP value.
> EIP: [<00000000>] 0x0 SS:ESP 0068:f760de5c
> Kernel panic - not syncing: Fatal exception in interrupt
> 
> 
> 
> Steps to reproduce:
> ifconfig eth0 up
> 
Comment 26 Markus Beck 2008-07-29 07:28:43 UTC
the interface renamed at boot time (rc.S => in debian) eth0 <==> eth1

device: 'eth0': device_rename: renaming to 'eth0_rename'
device: 'eth1': device_rename: renaming to 'eth0'
device: 'eth0_rename': device_rename: renaming to 'eth1'



(In reply to comment #25)
> Here you have two NIC cards, eth0 for VIA Rhine II and eth1 for VIA Velocity.
> And more, when doing 'ifconfig eth0 up', the eth1's open method is called?
> (In reply to comment #0)
> > Latest working kernel version: 2.6.26
> > Earliest failing kernel version: 2.6.25.10
> > Distribution: Debian 4.1.1-21
> > Hardware Environment: Via Epia SN (VT6130 chipset)
> > Software Environment: gcc version 4.1.2
> > Problem Description:
> > 
> > dmesg|grep eth0
> > 
> > eth0: VIA Rhine II at 0xfafff400, 00:40:63:f3:7b:d7, IRQ 23.
> > eth0: MII PHY found at address 1, status 0x7849 advertising 05e1 Link 0000.
> > 
> > ifconfig eth0 up
> > 
> > BUG: unable to handle kernel NULL pointer dereference at 00000000
> > IP: [<00000000>]
> > *pde = 00000000
> > Oops: 0000 [#1]
> > Modules linked in:
> > 
> > Pid: 2646, comm: ifconfig Not tainted (2.6.26 #3)
> > EIP: 0060:[<00000000>] EFLAGS: 00010016 CPU: 0
> > EIP is at 0x0
> > EAX: 000000a3 EBX: 000000a3 ECX: 000023a8 EDX: c045a188
> > ESI: 00000000 EDI: f7d37be4 EBP: 000080d0 ESP: f760de5c
> >  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
> > Process ifconfig (pid: 2646, ti=f760c000 task=f7d13440 task.ti=f760c000)
> > Stack: c01048f1 00001000 00001000 c0103a17 00001000 00000107 f7d37000
> 00001000
> >        f7d37be4 000080d0 00000000 0000007b 0000007b 00000000 ffffff5c
> c014c19d
> >        00000060 00010216 f7d32380 00000000 f7118100 f738e000 c024e171
> f7d32380
> > Call Trace:
> >  [<c01048f1>] do_IRQ+0x6a/0x7d
> >  [<c0103a17>] common_interrupt+0x23/0x28
> >  [<c014c19d>] __kmalloc+0x84/0x9a
> >  [<c024e171>] velocity_init_td_ring+0x32/0xe1
> >  [<c024f38e>] velocity_open+0x179/0x21b
> >  [<c02db39d>] dev_open+0x63/0x93
> >  [<c02da339>] dev_change_flags+0x92/0x13d
> >  [<c0311510>] devinet_ioctl+0x230/0x528
> >  [<c02d2642>] sock_ioctl+0x152/0x174
> >  [<c02d24f0>] sock_ioctl+0x0/0x174
> >  [<c0157146>] vfs_ioctl+0x16/0x48
> >  [<c015735b>] do_vfs_ioctl+0x1e3/0x1f6
> >  [<c033c56a>] do_page_fault+0x262/0x574
> >  [<c015739a>] sys_ioctl+0x2c/0x42
> >  [<c0103025>] sysenter_past_esp+0x6a/0x91
> >  =======================
> > Code:  Bad EIP value.
> > EIP: [<00000000>] 0x0 SS:ESP 0068:f760de5c
> > Kernel panic - not syncing: Fatal exception in interrupt
> > 
> > 
> > 
> > Steps to reproduce:
> > ifconfig eth0 up
> > 
Comment 27 Alan 2012-05-22 12:58:15 UTC
Pleaase re-open against a current kernel if still seen