Bug 11297 - OOPS in rt6_fill_node
Summary: OOPS in rt6_fill_node
Status: CLOSED CODE_FIX
Alias: None
Product: Networking
Classification: Unclassified
Component: IPV6 (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Hideaki YOSHIFUJI
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-08-09 17:26 UTC by paragw
Modified: 2008-08-11 18:05 UTC (History)
0 users

See Also:
Kernel Version: 2.6.27-rc2
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments

Description paragw 2008-08-09 17:26:16 UTC
Latest working kernel version: 2.6.26-rc4
Earliest failing kernel version: 2.6.26
Distribution: NA
Hardware Environment: x86
Software Environment: ip
Problem Description: 

$ ip -f inet6 route get fec0::1

Produces this oops -
BUG: unable to handle kernel NULL pointer dereference at 00000000

IP: [<c0369b85>] rt6_fill_node+0x175/0x3b0

*pdpt = 0000000036466001 *pde = 0000000000000000

Oops: 0000 [#1] SMP

Modules linked in: pcnet32 smsc47m192 i2c_i801 i2c_dev i2c_core r8169
coretemp i
t87 hwmon_vid lcm e1000e

Pid: 3033, comm: ip Not tainted (2.6.26.2 #1)

EIP: 0060:[<c0369b85>] EFLAGS: 00010246 CPU: 1

EIP is at rt6_fill_node+0x175/0x3b0

EAX: 00000000 EBX: f7115bbc ECX: 00000000 EDX: f7115c60

ESI: f7c1f100 EDI: f7548f00 EBP: f7115bdc ESP: f7115ba4

 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068

Process ip (pid: 3033, ti=f7114000 task=f64cbc50 task.ti=f7114000)

Stack: f7115bbc 00000000 f7115c54 f7115bc0 f7115c60 f6d75078 00000000
f7115bdc  
       c036a5f0 c036b360 00000000 f75487a0 00000000 f7548f00 f7115c9c
c036c30e  
       f7115c70 00000000 00000018 00000bd9 489b2024 00000000 00000000
00000000  
Call Trace:

 [<c036a5f0>] ? ip6_route_output+0x50/0xa0

 [<c036b360>] ? ip6_pol_route_output+0x0/0x20

 [<c036c30e>] ? inet6_rtm_getroute+0x16e/0x200

 [<c036c1a0>] ? inet6_rtm_getroute+0x0/0x200

 [<c030ef19>] ? rtnetlink_rcv_msg+0x1b9/0x1f0

 [<c030ed60>] ? rtnetlink_rcv_msg+0x0/0x1f0

 [<c031426d>] ? netlink_rcv_skb+0x8d/0xb0

 [<c030ed57>] ? rtnetlink_rcv+0x17/0x20

 [<c031402d>] ? netlink_unicast+0x23d/0x270

 [<c030162a>] ? memcpy_fromiovec+0x4a/0x70

 [<c0314811>] ? netlink_sendmsg+0x1c1/0x290

 [<c02fa165>] ? sock_sendmsg+0xc5/0xf0

 [<c01363a0>] ? autoremove_wake_function+0x0/0x50

 [<c01363a0>] ? autoremove_wake_function+0x0/0x50

 [<c02fa165>] ? sock_sendmsg+0xc5/0xf0

 [<c0217f37>] ? copy_from_user+0x37/0x70

 [<c03018ec>] ? verify_iovec+0x2c/0x90

 [<c02fa29a>] ? sys_sendmsg+0x10a/0x220

 [<c015ab08>] ? __inc_zone_page_state+0x18/0x20

 [<c01642ed>] ? __page_set_anon_rmap+0x2d/0x40

 [<c0164325>] ? page_add_new_anon_rmap+0x25/0x30

 [<c015eda6>] ? handle_mm_fault+0x606/0x750

 [<c0160f5e>] ? vma_adjust+0xfe/0x410

 [<c0113156>] ? do_page_fault+0x126/0x830

 [<c02fb343>] ? sys_socketcall+0x233/0x260

 [<c0102f39>] ? sysenter_past_esp+0x6a/0x91

 =======================

Code: 62 01 00 00 c6 43 01 80 8b 45 0c 85 c0 0f 85 13 02 00 00 8b 45 d8
85 c0 74
 3c 8b 86 88 00 00 00 8d 5d e0 31 c9 89 1c 24 8b 55 d8 <8b> 00 e8 d4 e3
ff ff 85
 c0 75 20 b9 10 00 00 00 ba 07 00 00 00

EIP: [<c0369b85>] rt6_fill_node+0x175/0x3b0 SS:ESP 0068:f7115ba4

---[ end trace e9f2563374550ae8 ]---


Steps to reproduce:
$ ip -f inet6 route get fec0::1
Comment 1 paragw 2008-08-09 17:29:45 UTC
Subject    : OOPS, ip -f inet6 route get fec0::1, linux-2.6.26, ip6_route_output, rt6_fill_node+0x175
Submitter  : John Gumb <john.gumb@tandberg.com>
Date       : 2008-08-07 17:00:56 GMT
References : http://article.gmane.org/gmane.linux.kernel/718101

Handled-By : Brian Haley <brian.haley@hp.com>
Patch : http://article.gmane.org/gmane.linux.network/102189
Comment 2 paragw 2008-08-09 17:30:17 UTC
Patch not in today's Linus git - will close after it lands there.
Comment 3 paragw 2008-08-09 17:57:19 UTC
Not sure the patch is completely correct - it prevents the oops but seems to introduce new regression, namely that where 2.6.24 produces some output for the command in question, with the patch applied, all I get is a EOF -

parag@parag-desktop:~$ ip -f inet6 route get fec0::1
EOF on netlink

Reopening.
Comment 4 Anonymous Emailer 2008-08-09 21:31:08 UTC
Reply-To: akpm@linux-foundation.org

On Sat,  9 Aug 2008 17:26:17 -0700 (PDT) bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=11297
> 
>            Summary: OOPS in rt6_fill_node
>            Product: Networking
>            Version: 2.5
>      KernelVersion: 2.6.27-rc2
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: IPV6
>         AssignedTo: yoshfuji@linux-ipv6.org
>         ReportedBy: parag.warudkar@gmail.com
> 
> 
> Latest working kernel version: 2.6.26-rc4
> Earliest failing kernel version: 2.6.26

Can you please confirm the version numbers here?  2.6.26-rc4 was OK,
but 2.6.26 and 2.6.27-rc2 are busted?

Brian had a patch but apparently things still aren't right (see the
full bugzilla report for details).


> Distribution: NA
> Hardware Environment: x86
> Software Environment: ip
> Problem Description: 
> 
> $ ip -f inet6 route get fec0::1
> 
> Produces this oops -
> BUG: unable to handle kernel NULL pointer dereference at 00000000
> 
> IP: [<c0369b85>] rt6_fill_node+0x175/0x3b0
> 
> *pdpt = 0000000036466001 *pde = 0000000000000000
> 
> Oops: 0000 [#1] SMP
> 
> Modules linked in: pcnet32 smsc47m192 i2c_i801 i2c_dev i2c_core r8169
> coretemp i
> t87 hwmon_vid lcm e1000e
> 
> Pid: 3033, comm: ip Not tainted (2.6.26.2 #1)
> 
> EIP: 0060:[<c0369b85>] EFLAGS: 00010246 CPU: 1
> 
> EIP is at rt6_fill_node+0x175/0x3b0
> 
> EAX: 00000000 EBX: f7115bbc ECX: 00000000 EDX: f7115c60
> 
> ESI: f7c1f100 EDI: f7548f00 EBP: f7115bdc ESP: f7115ba4
> 
>  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> 
> Process ip (pid: 3033, ti=f7114000 task=f64cbc50 task.ti=f7114000)
> 
> Stack: f7115bbc 00000000 f7115c54 f7115bc0 f7115c60 f6d75078 00000000
> f7115bdc  
>        c036a5f0 c036b360 00000000 f75487a0 00000000 f7548f00 f7115c9c
> c036c30e  
>        f7115c70 00000000 00000018 00000bd9 489b2024 00000000 00000000
> 00000000  
> Call Trace:
> 
>  [<c036a5f0>] ? ip6_route_output+0x50/0xa0
> 
>  [<c036b360>] ? ip6_pol_route_output+0x0/0x20
> 
>  [<c036c30e>] ? inet6_rtm_getroute+0x16e/0x200
> 
>  [<c036c1a0>] ? inet6_rtm_getroute+0x0/0x200
> 
>  [<c030ef19>] ? rtnetlink_rcv_msg+0x1b9/0x1f0
> 
>  [<c030ed60>] ? rtnetlink_rcv_msg+0x0/0x1f0
> 
>  [<c031426d>] ? netlink_rcv_skb+0x8d/0xb0
> 
>  [<c030ed57>] ? rtnetlink_rcv+0x17/0x20
> 
>  [<c031402d>] ? netlink_unicast+0x23d/0x270
> 
>  [<c030162a>] ? memcpy_fromiovec+0x4a/0x70
> 
>  [<c0314811>] ? netlink_sendmsg+0x1c1/0x290
> 
>  [<c02fa165>] ? sock_sendmsg+0xc5/0xf0
> 
>  [<c01363a0>] ? autoremove_wake_function+0x0/0x50
> 
>  [<c01363a0>] ? autoremove_wake_function+0x0/0x50
> 
>  [<c02fa165>] ? sock_sendmsg+0xc5/0xf0
> 
>  [<c0217f37>] ? copy_from_user+0x37/0x70
> 
>  [<c03018ec>] ? verify_iovec+0x2c/0x90
> 
>  [<c02fa29a>] ? sys_sendmsg+0x10a/0x220
> 
>  [<c015ab08>] ? __inc_zone_page_state+0x18/0x20
> 
>  [<c01642ed>] ? __page_set_anon_rmap+0x2d/0x40
> 
>  [<c0164325>] ? page_add_new_anon_rmap+0x25/0x30
> 
>  [<c015eda6>] ? handle_mm_fault+0x606/0x750
> 
>  [<c0160f5e>] ? vma_adjust+0xfe/0x410
> 
>  [<c0113156>] ? do_page_fault+0x126/0x830
> 
>  [<c02fb343>] ? sys_socketcall+0x233/0x260
> 
>  [<c0102f39>] ? sysenter_past_esp+0x6a/0x91
> 
>  =======================
> 
> Code: 62 01 00 00 c6 43 01 80 8b 45 0c 85 c0 0f 85 13 02 00 00 8b 45 d8
> 85 c0 74
>  3c 8b 86 88 00 00 00 8d 5d e0 31 c9 89 1c 24 8b 55 d8 <8b> 00 e8 d4 e3
> ff ff 85
>  c0 75 20 b9 10 00 00 00 ba 07 00 00 00
> 
> EIP: [<c0369b85>] rt6_fill_node+0x175/0x3b0 SS:ESP 0068:f7115ba4
> 
> ---[ end trace e9f2563374550ae8 ]---
> 
> 
> Steps to reproduce:
> $ ip -f inet6 route get fec0::1
> 
> 
> -- 
> Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug, or are watching someone who is.
Comment 5 paragw 2008-08-10 07:03:19 UTC
On Sun, Aug 10, 2008 at 12:31 AM, Andrew Morton
<akpm@linux-foundation.org> wrote:

> Can you please confirm the version numbers here?  2.6.26-rc4 was OK,
> but 2.6.26 and 2.6.27-rc2 are busted?
>

John Gumb's original report (referenced in bugzilla) says the issue is
happening since 2.6.26-rc4 and that 2.6.26.2 also had it.
Current mainline also has this problem (tested on everything past 2.6.27-rc2).

Parag
Comment 6 Anonymous Emailer 2008-08-10 17:31:03 UTC
Reply-To: brian.haley@hp.com

Andrew Morton wrote:
> Can you please confirm the version numbers here?  2.6.26-rc4 was OK,
> but 2.6.26 and 2.6.27-rc2 are busted?

This commit would have broken this, which git-whatchanged shows as being 
in 2.6.24-rc4:

commit 5e5f3f0f801321078c897a5de0b4b4304f234da0
Author: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Date:   Mon Mar 3 21:44:34 2008 +0900

     [IPV6] ADDRCONF: Convert ipv6_get_saddr() to ipv6_dev_get_saddr().

     Since most users of ipv6_get_saddr() pass non-NULL as
     dst argument, use ipv6_dev_get_saddr() directly.

     Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

Most people have default IPv6 routes, which is probably why it hasn't 
been seen.

> Brian had a patch but apparently things still aren't right (see the
> full bugzilla report for details).

John sent me email off-line saying the patch fixed the problem, but he'd 
do more testing over the weekend.

-Brian
Comment 7 paragw 2008-08-10 18:05:11 UTC
On Sun, Aug 10, 2008 at 8:30 PM, Brian Haley <brian.haley@hp.com> wrote:

> John sent me email off-line saying the patch fixed the problem, but he'd do
> more testing over the weekend.

I did test the patch and it now spits out "EOF on netlink" like
mentioned in bugzilla, which is not consistent with earlier behavior
which was to output -
unreachable fec0::1 dev lo  table unspec  proto none  src ::1  metric
-1  error -101 hoplimit 255 .

Parag
Comment 8 Hideaki YOSHIFUJI 2008-08-11 08:20:44 UTC
Actually I'm curious why rt6i_idev is still NULL there...
I think that is the problem, but okay, I have to agree with Brian's patch.
The output difference might be another problem.  Need more analysis.
Comment 9 David S. Miller 2008-08-11 14:32:42 UTC
From: "Parag Warudkar" <parag.warudkar@gmail.com>
Date: Sun, 10 Aug 2008 21:05:07 -0400

> On Sun, Aug 10, 2008 at 8:30 PM, Brian Haley <brian.haley@hp.com> wrote:
> 
> > John sent me email off-line saying the patch fixed the problem, but he'd do
> > more testing over the weekend.
> 
> I did test the patch and it now spits out "EOF on netlink" like
> mentioned in bugzilla, which is not consistent with earlier behavior
> which was to output -
> unreachable fec0::1 dev lo  table unspec  proto none  src ::1  metric
> -1  error -101 hoplimit 255 .

We need to resolve this, Brian?
Comment 10 Anonymous Emailer 2008-08-11 16:45:19 UTC
Reply-To: brian.haley@hp.com

David Miller wrote:
> From: "Parag Warudkar" <parag.warudkar@gmail.com>
> Date: Sun, 10 Aug 2008 21:05:07 -0400
> 
>> On Sun, Aug 10, 2008 at 8:30 PM, Brian Haley <brian.haley@hp.com> wrote:
>>
>>> John sent me email off-line saying the patch fixed the problem, but he'd do
>>> more testing over the weekend.
>> I did test the patch and it now spits out "EOF on netlink" like
>> mentioned in bugzilla, which is not consistent with earlier behavior
>> which was to output -
>> unreachable fec0::1 dev lo  table unspec  proto none  src ::1  metric
>> -1  error -101 hoplimit 255 .
> 
> We need to resolve this, Brian?

I don't see "EOF on netlink" on 2.6.27-rc2.  I can dig-up a 2.6.24 
kernel, I'm just wondering if that EOF happens when the IPv6 module 
isn't loaded or something?

-Brian
Comment 11 paragw 2008-08-11 17:02:31 UTC
On Mon, Aug 11, 2008 at 7:44 PM, Brian Haley <brian.haley@hp.com> wrote:

> I don't see "EOF on netlink" on 2.6.27-rc2.  I can dig-up a 2.6.24 kernel,
> I'm just wondering if that EOF happens when the IPv6 module isn't loaded or
> something?

With ipv6 not loaded it returns not supported or something similar -
correct of course.

What output do you see with your patch?

Parag
Comment 12 Anonymous Emailer 2008-08-11 17:41:50 UTC
Reply-To: brian.haley@hp.com

Parag Warudkar wrote:
> 
> What output do you see with your patch?

# ip -f inet6 route get fec0::1
unreachable fec0::1 dev lo  table unspec  proto none  src 
2001:1890:1109:a10:218:feff:fe7f:49c8  metric -1  error -101 hoplimit 255

And if I down eth0 I get:

# ip -f inet6 route get fec0::1
unreachable fec0::1 dev lo  table unspec  proto none  src ::1  metric -1 
  error -101 hoplimit 255

This is 2.6.27-rc2, like I said, I"m building a 2.6.24 kernel now.

-Brian
Comment 13 paragw 2008-08-11 18:05:23 UTC
Grrr. It looks like I was bitten by the infamous Netlink "No buffer space available" error which I was somehow overlooked. Applying your patch to a kernel which boots without the Netlink buffer space error shows the right output.

Closing. Thanks.
Comment 14 Anonymous Emailer 2008-08-11 18:11:04 UTC
Reply-To: eugeneteo@kernel.sg

On Tue, Aug 12, 2008 at 8:41 AM, Brian Haley <brian.haley@hp.com> wrote:
> Parag Warudkar wrote:
>>
>> What output do you see with your patch?
>
> # ip -f inet6 route get fec0::1
> unreachable fec0::1 dev lo  table unspec  proto none  src
> 2001:1890:1109:a10:218:feff:fe7f:49c8  metric -1  error -101 hoplimit 255

Hmm, I tried it on an older kernel that doesn't have Yoshfuji-san's
ipv6_get_saddr() changes,
and it should display the output with the loopback MAC address instead
of ethX MAC address.
Correct me if I am wrong.

Thanks,
Eugene
Comment 15 David S. Miller 2008-08-11 18:40:27 UTC
From: "Eugene Teo" <eugeneteo@kernel.sg>
Date: Tue, 12 Aug 2008 09:28:05 +0800

> On Tue, Aug 12, 2008 at 9:10 AM, Eugene Teo <eugeneteo@kernel.sg> wrote:
> > On Tue, Aug 12, 2008 at 8:41 AM, Brian Haley <brian.haley@hp.com> wrote:
> >> Parag Warudkar wrote:
> >>>
> >>> What output do you see with your patch?
> >>
> >> # ip -f inet6 route get fec0::1
> >> unreachable fec0::1 dev lo  table unspec  proto none  src
> >> 2001:1890:1109:a10:218:feff:fe7f:49c8  metric -1  error -101 hoplimit 255
> >
> > Hmm, I tried it on an older kernel that doesn't have Yoshfuji-san's
> > ipv6_get_saddr() changes,
> > and it should display the output with the loopback MAC address instead
> > of ethX MAC address.
> > Correct me if I am wrong.
> 
> Evidence of me still sleepy. Not the MAC address but the ipv6 address...

Hmmm... from what I understand so far based upon Parag's most
recent reply, Brian's patch should be OK.

Does everyone else agree?
Comment 16 Anonymous Emailer 2008-08-11 18:42:25 UTC
Reply-To: eugeneteo@kernel.sg

On Tue, Aug 12, 2008 at 9:10 AM, Eugene Teo <eugeneteo@kernel.sg> wrote:
> On Tue, Aug 12, 2008 at 8:41 AM, Brian Haley <brian.haley@hp.com> wrote:
>> Parag Warudkar wrote:
>>>
>>> What output do you see with your patch?
>>
>> # ip -f inet6 route get fec0::1
>> unreachable fec0::1 dev lo  table unspec  proto none  src
>> 2001:1890:1109:a10:218:feff:fe7f:49c8  metric -1  error -101 hoplimit 255
>
> Hmm, I tried it on an older kernel that doesn't have Yoshfuji-san's
> ipv6_get_saddr() changes,
> and it should display the output with the loopback MAC address instead
> of ethX MAC address.
> Correct me if I am wrong.

Evidence of me still sleepy. Not the MAC address but the ipv6 address...

Eugene
Comment 17 Anonymous Emailer 2008-08-11 19:07:17 UTC
Reply-To: brian.haley@hp.com

David Miller wrote:
> Hmmm... from what I understand so far based upon Parag's most
> recent reply, Brian's patch should be OK.
> 
> Does everyone else agree?

Just an fyi I think part of the confusion was the output I posted:

# ip -f inet6 route get fec0::1
unreachable fec0::1 dev lo  table unspec  proto none  src
2001:1890:1109:a10:218:feff:fe7f:49c8  metric -1  error -101 hoplimit 255

On my system I have a global address on eth0, so that's printed in my 
output.  Others don't have a global, so see ::1, which is expected.  I 
see the same behavior on my Debian Lenny 2.6.18 box as 2.6.27, so my 
patch doesn't seem to have changed anything.

-Brian

Note You need to log in before you can comment on or make changes to this bug.