Bug 11391

Summary: Kernel NULL pointer dereference in do_notify_parent()
Product: Process Management Reporter: Robert Rex (robert.rex)
Component: OtherAssignee: process_other
Status: CLOSED CODE_FIX    
Severity: normal CC: bunk, oleg
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.26.3 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: test program to reproduce failure
stupid test patch
[PATCH] change ->child_reaper when init->group_leader exits

Description Robert Rex 2008-08-21 05:58:51 UTC
Latest working kernel version: 2.6.26.3

Earliest failing kernel version: 2.6.25.4 (didn't test with former kernels)

Distribution: CentOS 5.1 (with Vanilla kernel from kernel.org)

Hardware Environment: several x86_64 plattforms (AMD Opteron, Intel Xeon)

Problem Description:
-------------------------------------
BUG: unable to handle kernel NULL pointer dereference at virtual address 0000000000000020
IP: [<ffffffff8023d5d0>] do_notify_parent+0x66/0x194
PGD 0
Oops: 0000 [1] SMP
CPU 1
Modules linked in: ipv6 autofs4 hidp rfcomm l2cap bluetooth sunrpc dm_mirror dm_
log dm_multipath dm_mod video output sbs sbshc battery acpi_memhotplug ac lp sg
floppy button tg3 serio_raw parport_pc parport k8temp hwmon i2c_amd756 i2c_amd81
11 i2c_core amd_rng shpchp pcspkr usb_storage 3w_9xxx sata_sil libata sd_mod scs
i_mod raid456 async_xor async_memcpy async_tx xor ext3 jbd ehci_hcd ohci_hcd uhc
i_hcd
Pid: 3800, comm: sshd Not tainted 2.6.26.3 #1
RIP: 0010 [<ffffffff8023d5d0>]  [<ffffffff8023d5d0>] do_notify_parent+0x66/0x194
RSP: 0018:ffff8101fd943c78  EFLAGS: 00010046
RAX: 0000000000000000 RBX: ffff8101fe08f2f0 RCX: ffff8101fd956870
RDX: ffff8101fe08f4c0 RSI: 0000000000000011 RDI: ffff8101fe08f2f0
RBP: 0000000000000000 R08: 0000000000000009 R09: 0000000000000009
R10: 0000000000000002 R11: ffffffff802f1c0e R12: 0000000000000011
R13: ffff8101fe4e00c0 R14: 0000000000000000 R15: 0000000000000001
FS:  00007fce4b4b2710(0000) GS:ffff8101ff08c8c0(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000020 CR3: 0000000000201000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process sshd (pid: 3800, threadinfo ffff8101fd942000, task ffff8101fe4e00d0)
Stack:  0000000000000011 ffff8101fec76630 ffff8101fe0e1180 ffffffff8029d597
 0000000000000008 ffff8101fe0e1180 ffff8101fe7e87c0 ffffffff802a1915
 ffff8101fd856c40 ffff8101fe0e1180 ffff8101fd856c40 0000000000000000
Call Trace:
[<ffffffff8029d597>] dput+0x26/0xe7
[<ffffffff802a1915>] mntput_no_expire+0x20/0x119
[<ffffffff8028b557>] filp_close+0x5d/0x65
[<ffffffff80233cd1>] reparent_thread+0x139/0x14d
[<ffffffff802350ba>] do_exit+0x39a/0x68c
[<ffffffff80235412>] do_group_exit+0x66/0x96
[<ffffffff8023d4f7>] get_signal_to_deliver+0x2ea/0x305
[<ffffffff8020b166>] do_notify_resume+0xaf/0x7de
[<ffffffff802435de>] autoremove_wake_function+0x0/0x2e
[<ffffffff80236198>] current_fd_time+0x1e/0x24
[<ffffffff8036dfdb>] tty_ldisc_deref+0x62/0x75
[<ffffffff8025bdfe>] autit_syscall_exit+0x2e4/0x303
[<ffffffff8020bf8c>] int_signal+x012/0x17

Code: 00 48 39 87 30 02 00 00 74 04 0f 0b eb fe 44 89 24 24 c7 44 24 04 00 00 00
 00 48 8b 83 b8 01 00 00 48 89 df 48 8b 80 98 04 00 00 <48> 8b 70 20 e8 57 39 00
 00 48 8b 93 a0 04 00 00 89 44 24 10 8b
RIP  [<ffffffff8023d5c9>] do_notify_parent+0x66/0x194
 RSP <ffff8101f7535c78>
CR2: 0000000000000020
---[ end trace 8df15d3ad47033c0 ]---
Fixing recursive fault but reboot is needed!
-------------------------------------

Problem happens with PID namespaces enabled. After killing the child reaper of a new namespace with SIGKILL, the kernel crashes. I did some debugging and as far as I could see, the NULL pointer dereference happens on this line:

info.si_pid = task_pid_nr_ns(tsk, tsk->parent->nsproxy->pid_ns);

I did a BUG_ON(!tsk->parent->nsproxy) one line above and got an appropriate message before the kernel crashed.

Software Environment:
(test program attached)

Steps to reproduce:

Compile the attached test program with "gcc -o ns_exec ns_exec.c -lpthread". After being started, it will create a new PID namespace, mount a proc filesystem herein, create a new thread and fork() into an SSHd.
Login via SSH (the port of the started SSHd is hardcoded in the test program, so you'll have to modify it appropriately if you wish to do so ;-) ). Do a "kill -9 1". On my machines, the kernel crashed in over 90% of all tests.
Comment 1 Robert Rex 2008-08-21 06:00:07 UTC
Created attachment 17353 [details]
test program to reproduce failure
Comment 2 Anonymous Emailer 2008-08-21 09:24:17 UTC
Reply-To: akpm@linux-foundation.org


(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Thu, 21 Aug 2008 05:58:52 -0700 (PDT) bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=11391
> 
>            Summary: Kernel NULL pointer dereference in do_notify_parent()
>            Product: Process Management
>            Version: 2.5
>      KernelVersion: 2.6.26.3

Should have been 2.6.26.4?

>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Other
>         AssignedTo: process_other@kernel-bugs.osdl.org
>         ReportedBy: robert.rex@exasol.com
> 
> 
> Latest working kernel version: 2.6.26.3
> 
> Earliest failing kernel version: 2.6.25.4 (didn't test with former kernels)

Appears to be a regression in -stable.   Did any namespacy things go into
2.6.26.4?

> Distribution: CentOS 5.1 (with Vanilla kernel from kernel.org)
> 
> Hardware Environment: several x86_64 plattforms (AMD Opteron, Intel Xeon)
> 
> Problem Description:
> -------------------------------------
> BUG: unable to handle kernel NULL pointer dereference at virtual address
> 0000000000000020
> IP: [<ffffffff8023d5d0>] do_notify_parent+0x66/0x194
> PGD 0
> Oops: 0000 [1] SMP
> CPU 1
> Modules linked in: ipv6 autofs4 hidp rfcomm l2cap bluetooth sunrpc dm_mirror
> dm_
> log dm_multipath dm_mod video output sbs sbshc battery acpi_memhotplug ac lp
> sg
> floppy button tg3 serio_raw parport_pc parport k8temp hwmon i2c_amd756
> i2c_amd81
> 11 i2c_core amd_rng shpchp pcspkr usb_storage 3w_9xxx sata_sil libata sd_mod
> scs
> i_mod raid456 async_xor async_memcpy async_tx xor ext3 jbd ehci_hcd ohci_hcd
> uhc
> i_hcd
> Pid: 3800, comm: sshd Not tainted 2.6.26.3 #1
> RIP: 0010 [<ffffffff8023d5d0>]  [<ffffffff8023d5d0>]
> do_notify_parent+0x66/0x194
> RSP: 0018:ffff8101fd943c78  EFLAGS: 00010046
> RAX: 0000000000000000 RBX: ffff8101fe08f2f0 RCX: ffff8101fd956870
> RDX: ffff8101fe08f4c0 RSI: 0000000000000011 RDI: ffff8101fe08f2f0
> RBP: 0000000000000000 R08: 0000000000000009 R09: 0000000000000009
> R10: 0000000000000002 R11: ffffffff802f1c0e R12: 0000000000000011
> R13: ffff8101fe4e00c0 R14: 0000000000000000 R15: 0000000000000001
> FS:  00007fce4b4b2710(0000) GS:ffff8101ff08c8c0(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 0000000000000020 CR3: 0000000000201000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process sshd (pid: 3800, threadinfo ffff8101fd942000, task ffff8101fe4e00d0)
> Stack:  0000000000000011 ffff8101fec76630 ffff8101fe0e1180 ffffffff8029d597
>  0000000000000008 ffff8101fe0e1180 ffff8101fe7e87c0 ffffffff802a1915
>  ffff8101fd856c40 ffff8101fe0e1180 ffff8101fd856c40 0000000000000000
> Call Trace:
> [<ffffffff8029d597>] dput+0x26/0xe7
> [<ffffffff802a1915>] mntput_no_expire+0x20/0x119
> [<ffffffff8028b557>] filp_close+0x5d/0x65
> [<ffffffff80233cd1>] reparent_thread+0x139/0x14d
> [<ffffffff802350ba>] do_exit+0x39a/0x68c
> [<ffffffff80235412>] do_group_exit+0x66/0x96
> [<ffffffff8023d4f7>] get_signal_to_deliver+0x2ea/0x305
> [<ffffffff8020b166>] do_notify_resume+0xaf/0x7de
> [<ffffffff802435de>] autoremove_wake_function+0x0/0x2e
> [<ffffffff80236198>] current_fd_time+0x1e/0x24
> [<ffffffff8036dfdb>] tty_ldisc_deref+0x62/0x75
> [<ffffffff8025bdfe>] autit_syscall_exit+0x2e4/0x303
> [<ffffffff8020bf8c>] int_signal+x012/0x17
> 
> Code: 00 48 39 87 30 02 00 00 74 04 0f 0b eb fe 44 89 24 24 c7 44 24 04 00 00
> 00
>  00 48 8b 83 b8 01 00 00 48 89 df 48 8b 80 98 04 00 00 <48> 8b 70 20 e8 57 39
> 00
>  00 48 8b 93 a0 04 00 00 89 44 24 10 8b
> RIP  [<ffffffff8023d5c9>] do_notify_parent+0x66/0x194
>  RSP <ffff8101f7535c78>
> CR2: 0000000000000020
> ---[ end trace 8df15d3ad47033c0 ]---
> Fixing recursive fault but reboot is needed!
> -------------------------------------
> 
> Problem happens with PID namespaces enabled. After killing the child reaper
> of
> a new namespace with SIGKILL, the kernel crashes. I did some debugging and as
> far as I could see, the NULL pointer dereference happens on this line:
> 
> info.si_pid = task_pid_nr_ns(tsk, tsk->parent->nsproxy->pid_ns);
> 
> I did a BUG_ON(!tsk->parent->nsproxy) one line above and got an appropriate
> message before the kernel crashed.
> 
> Software Environment:
> (test program attached)
> 
> Steps to reproduce:
> 
> Compile the attached test program with "gcc -o ns_exec ns_exec.c -lpthread".
> After being started, it will create a new PID namespace, mount a proc
> filesystem herein, create a new thread and fork() into an SSHd.
> Login via SSH (the port of the started SSHd is hardcoded in the test program,
> so you'll have to modify it appropriately if you wish to do so ;-) ). Do a
> "kill -9 1". On my machines, the kernel crashed in over 90% of all tests.
> 
> 
Comment 3 Oleg Nesterov 2008-08-21 10:16:06 UTC
Heh. The bug looks very obvious, but the fix is not.

We don't change the child reaper when the main thread of /sbin/init
exits. There is a lengthy comment in exit_child_reaper(), and yes,
pid_ns->child_reaper remains valid. But group_leader->nsproxy == NULL
after exit_task_namespaces().

So, "kill -9 1"  kills init. If the main thread dies first and clears
->nsproxy, the next re-parenting triggers this bug. I didn't verify
this, but I am almost sure.

I'll try to think more later, but (sorry!) I can do nothing until
Sunday.

Thanks Robert!

I'll send the stupid test patch in a minute, could you try it?
Comment 4 Oleg Nesterov 2008-08-21 10:19:00 UTC
Created attachment 17356 [details]
stupid test patch
Comment 5 Serge Hallyn 2008-08-21 14:59:05 UTC
Quoting Andrew Morton (akpm@linux-foundation.org):
> 
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
> 
> On Thu, 21 Aug 2008 05:58:52 -0700 (PDT) bugme-daemon@bugzilla.kernel.org
> wrote:
> 
> > http://bugzilla.kernel.org/show_bug.cgi?id=11391
> > 
> >            Summary: Kernel NULL pointer dereference in do_notify_parent()
> >            Product: Process Management
> >            Version: 2.5
> >      KernelVersion: 2.6.26.3
> 
> Should have been 2.6.26.4?
> 
> >           Platform: All
> >         OS/Version: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: normal
> >           Priority: P1
> >          Component: Other
> >         AssignedTo: process_other@kernel-bugs.osdl.org
> >         ReportedBy: robert.rex@exasol.com
> > 
> > 
> > Latest working kernel version: 2.6.26.3
> > 
> > Earliest failing kernel version: 2.6.25.4 (didn't test with former kernels)

I'm sorry, I'm confused.  Which kernel version exactly fails?  I don't
see 2.6.26.4 on kernel.org, and Linus' tree (just pulled) doesn't fail
for me.

thanks,
-serge
Comment 6 Robert Rex 2008-08-21 23:34:41 UTC
Andrew Morton wrote:
>> http://bugzilla.kernel.org/show_bug.cgi?id=11391
>>
>>            Summary: Kernel NULL pointer dereference in do_notify_parent()
>>            Product: Process Management
>>            Version: 2.5
>>      KernelVersion: 2.6.26.3
> 
> Should have been 2.6.26.4?
> 
>> [...]
>>
>> Latest working kernel version: 2.6.26.3
>>
>> Earliest failing kernel version: 2.6.25.4 (didn't test with former kernels)
> 
> Appears to be a regression in -stable.   Did any namespacy things go into
> 2.6.26.4?

Oops, my mistake. I accidentially marked 2.6.26.3 as the latest working 
kernel. I didn't mean to do so (I tested 2.6.25.4, 2.6.26.2 and 2.6.26.3 
and all of these kernels crashed).

As far as I understand the problem and the appropriate source code right 
now, I assume that this problem was introduced with PID namespaces in 
2.6.24 and did not affect former kernels (because these didn't have this 
feature).

I'll try Oleg's posted patch as soon as possible and look what happens.

Sorry for that confusion!

Thanks,
Robert
Comment 7 Robert Rex 2008-08-22 01:01:35 UTC
> [...]
> 
> If the main thread dies first and clears
> ->nsproxy, the next re-parenting triggers this bug. I didn't verify
> this, but I am almost sure.
> 
> [...]
> 
> I'll send the stupid test patch in a minute, could you try it?

Thanks for your quick reply and analysis! I tested 2.6.25.4, 2.6.26.2 
and 2.6.26.3 again with your patch and there were no kernel panics 
anymore  (I've also looked into dmesg and there were no suspicious 
warnings etc. after killing the child reaper). So your assumption seems 
to be absolutely right.

Thanks again,
Robert
Comment 8 Oleg Nesterov 2008-08-24 10:14:36 UTC
Created attachment 17416 [details]
[PATCH] change ->child_reaper when init->group_leader exits

The patch is against 2.6.27-rc4, hopefully fixes this bug.
Tested with Robert's test-case.
Comment 9 Adrian Bunk 2008-09-02 23:39:11 UTC
fixed by commit 950bbabb5a804690a0201190de5c22837f72f83f