Bug 12201

Summary: long wait in call_usermodehelper() / queue_work() / wait_for_completion()
Product: Process Management Reporter: Michael Spiegle (mike)
Component: OtherAssignee: process_other
Status: CLOSED OBSOLETE    
Severity: normal CC: alan, mike
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.26.8 Subsystem:
Regression: No Bisected commit-id:
Attachments: Kernel config
Task Trace

Description Michael Spiegle 2008-12-11 14:15:20 UTC
Latest working kernel version: None
Earliest failing kernel version: 2.6.26 (can't test any older)
Distribution: CentOS 5
Hardware Environment: Sun x4450, 16-cores, 128GB of RAM
Software Environment: CentOS 5 + Apache webserver
Problem Description:
My problem started with SSH using the audit library, and my kernel not having AUDIT support.  During strace, any call to socket(PF_NETLINK, SOCK_RAW, NETLINK_AUDIT) took 1-2 seconds to return.  During this time, sys% was high.

As I continued to dig deeper (using lots of printks), I found that these delays were caused by the netlink_create() code calling request_module() to find/load a module for AUDIT support which doesn't exist.

Continuing to dig, I found that request_module() uses call_usermodehelper() to run /sbin/modprobe to find/load the module.

The farthest I got is that after the process is created, we call wait_for_completion() to get the result of that process.  This waiting process takes 1-2 seconds.

The big problem in troubleshooting here is that this only starts to happen after the server has been online for a while (10 days maybe) and serving lots of traffic.  The delay gradually builds up and maxes out at around 2 seconds.

If I manually call /sbin/modprobe on the commandline and provide it the same arguments that call_usermodehelper() uses, the command returns instantly 100% of the time (assuming server has been on for a while).

If I write a small pilot program that calls socket(PF_NETLINK, SOCK_RAW, NETLINK_AUDIT), it will delay by 1-2 seconds 100% of the time (assuming server has been online for a while).  Certain protocol types given to socket() have zero delay (because no module needs to be loaded).

Steps to reproduce:
Once server has been online for a while, a simple call to socket(PF_NETLINK, SOCK_RAW, NETLINK_AUDIT) shows the problem.
Comment 1 Michael Spiegle 2008-12-11 14:20:05 UTC
I'd like to add that once the kernel is in this funky state, I can stop all processes/traffic to the box (mpstat returns zeros across the board), and creating this socket still has a delay.  The issue does not appear to be related to any sort of load at the time.
Comment 2 Anonymous Emailer 2008-12-11 14:38:00 UTC
Reply-To: akpm@linux-foundation.org


(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Thu, 11 Dec 2008 14:15:21 -0800 (PST)
bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=12201
> 
>            Summary: long wait in call_usermodehelper() / queue_work() /
>                     wait_for_completion()
>            Product: Process Management
>            Version: 2.5
>      KernelVersion: 2.6.26.8
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Other
>         AssignedTo: process_other@kernel-bugs.osdl.org
>         ReportedBy: mike@nauticaltech.com
> 
> 
> Latest working kernel version: None
> Earliest failing kernel version: 2.6.26 (can't test any older)

It'd be great if you could test something more recent please.

> Distribution: CentOS 5
> Hardware Environment: Sun x4450, 16-cores, 128GB of RAM
> Software Environment: CentOS 5 + Apache webserver
> Problem Description:
> My problem started with SSH using the audit library, and my kernel not having
> AUDIT support.  During strace, any call to socket(PF_NETLINK, SOCK_RAW,
> NETLINK_AUDIT) took 1-2 seconds to return.  During this time, sys% was high.

Well this is bad.  We don't want the kernel calling out to userspace
each time you run socket(PF_NETLINK, ...).  The performance could be
awful.

I don't know if this is a net problem, an audit problem or whatever. 
Probably the offending kernel code simply shouldn't exist if
CONFIG_AUDIT=n.

Please attach a copy of the config to
http://bugzilla.kernel.org/show_bug.cgi?id=12201

> As I continued to dig deeper (using lots of printks), I found that these
> delays
> were caused by the netlink_create() code calling request_module() to
> find/load
> a module for AUDIT support which doesn't exist.
> 
> Continuing to dig, I found that request_module() uses call_usermodehelper()
> to
> run /sbin/modprobe to find/load the module.
> 
> The farthest I got is that after the process is created, we call
> wait_for_completion() to get the result of that process.  This waiting
> process
> takes 1-2 seconds.
> 
> The big problem in troubleshooting here is that this only starts to happen
> after the server has been online for a while (10 days maybe) and serving lots
> of traffic.  The delay gradually builds up and maxes out at around 2 seconds.
> 
> If I manually call /sbin/modprobe on the commandline and provide it the same
> arguments that call_usermodehelper() uses, the command returns instantly 100%
> of the time (assuming server has been on for a while).
> 
> If I write a small pilot program that calls socket(PF_NETLINK, SOCK_RAW,
> NETLINK_AUDIT), it will delay by 1-2 seconds 100% of the time (assuming
> server
> has been online for a while).  Certain protocol types given to socket() have
> zero delay (because no module needs to be loaded).
> 
> Steps to reproduce:
> Once server has been online for a while, a simple call to socket(PF_NETLINK,
> SOCK_RAW, NETLINK_AUDIT) shows the problem.

OK, weird.

Please get sysrq working then get us a task trace, so we can see who is
sleeping where.  Do this:


- run your "small pilot program"

- wait one second (so we catch it while it is delaying)

- echo t > /proc/sysrq-trigger

- dmesg -s 1000000 > foo

- send us a copy of foo.

(foo will be large, so it would be best to attach it to
http://bugzilla.kernel.org/show_bug.cgi?id=12201 then email us the URL)

(Using `echo w > /proc/sysrq-trigger' would work too, if we know that
the offending processes are stuck in D state.  It will produce less
output)
Comment 3 Anonymous Emailer 2008-12-11 14:52:08 UTC
Reply-To: zbr@ioremap.net

Hi.

On Thu, Dec 11, 2008 at 02:37:58PM -0800, Andrew Morton (akpm@linux-foundation.org) wrote:
> > If I write a small pilot program that calls socket(PF_NETLINK, SOCK_RAW,
> > NETLINK_AUDIT), it will delay by 1-2 seconds 100% of the time (assuming
> server
> > has been online for a while).  Certain protocol types given to socket()
> have
> > zero delay (because no module needs to be loaded).
> > 
> > Steps to reproduce:
> > Once server has been online for a while, a simple call to
> socket(PF_NETLINK,
> > SOCK_RAW, NETLINK_AUDIT) shows the problem.
> 
> OK, weird.
> 
> Please get sysrq working then get us a task trace, so we can see who is
> sleeping where.  Do this:
> 
> 
> - run your "small pilot program"

Or send it to us and we will quickly fix the bug :)

Thank you!
Comment 4 Kay Sievers 2008-12-11 14:54:43 UTC
On Thu, Dec 11, 2008 at 23:37, Andrew Morton <akpm@linux-foundation.org> wrote:
>> As I continued to dig deeper (using lots of printks), I found that these
>> delays
>> were caused by the netlink_create() code calling request_module() to
>> find/load
>> a module for AUDIT support which doesn't exist.
>>
>> Continuing to dig, I found that request_module() uses call_usermodehelper()
>> to
>> run /sbin/modprobe to find/load the module.
>>
>> The farthest I got is that after the process is created, we call
>> wait_for_completion() to get the result of that process.  This waiting
>> process
>> takes 1-2 seconds.
>>
>> The big problem in troubleshooting here is that this only starts to happen
>> after the server has been online for a while (10 days maybe) and serving
>> lots
>> of traffic.  The delay gradually builds up and maxes out at around 2
>> seconds.
>>
>> If I manually call /sbin/modprobe on the commandline and provide it the same
>> arguments that call_usermodehelper() uses, the command returns instantly
>> 100%
>> of the time (assuming server has been on for a while).
>>
>> If I write a small pilot program that calls socket(PF_NETLINK, SOCK_RAW,
>> NETLINK_AUDIT), it will delay by 1-2 seconds 100% of the time (assuming
>> server
>> has been online for a while).  Certain protocol types given to socket() have
>> zero delay (because no module needs to be loaded).
>>
>> Steps to reproduce:
>> Once server has been online for a while, a simple call to socket(PF_NETLINK,
>> SOCK_RAW, NETLINK_AUDIT) shows the problem.

If you replace /sbin/modprobe in the kernel module loader, does the
delay go away:
  echo /bin/true > /proc/sys/kernel/modprobe
?

Kay
Comment 5 Anonymous Emailer 2008-12-11 15:05:53 UTC
Reply-To: zbr@ioremap.net

On Fri, Dec 12, 2008 at 01:51:58AM +0300, Evgeniy Polyakov (zbr@ioremap.net) wrote:
> > > If I write a small pilot program that calls socket(PF_NETLINK, SOCK_RAW,
> > > NETLINK_AUDIT), it will delay by 1-2 seconds 100% of the time (assuming
> server
> > > has been online for a while).  Certain protocol types given to socket()
> have
> > > zero delay (because no module needs to be loaded).
> > > 
> > > Steps to reproduce:
> > > Once server has been online for a while, a simple call to
> socket(PF_NETLINK,
> > > SOCK_RAW, NETLINK_AUDIT) shows the problem.
> > 
> > OK, weird.
> > 
> > Please get sysrq working then get us a task trace, so we can see who is
> > sleeping where.  Do this:
> > 
> > 
> > - run your "small pilot program"
> 
> Or send it to us and we will quickly fix the bug :)
> 
> Thank you!

Btw, does this ugly hack fix the problem?
According to your description it may be the case.

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 480184a..2b64eb5 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -434,6 +434,10 @@ static int netlink_create(struct net *net, struct socket *sock, int protocol)
 	if (protocol < 0 || protocol >= MAX_LINKS)
 		return -EPROTONOSUPPORT;
 
+#ifndef CONFIG_AUDIT
+	if (protocol == NETLINK_AUDIT)
+		return -EPROTONOSUPPORT;
+#endif
 	netlink_lock_table();
 #ifdef CONFIG_MODULES
 	if (!nl_table[protocol].registered) {
Comment 6 Michael Spiegle 2008-12-12 11:01:00 UTC
Created attachment 19263 [details]
Kernel config
Comment 7 Michael Spiegle 2008-12-12 11:23:14 UTC
Created attachment 19264 [details]
Task Trace
Comment 8 Michael Spiegle 2008-12-12 11:43:23 UTC
> It'd be great if you could test something more recent please.

Will do.  It takes a little bit of time to reproduce, so I'll respond 
now with information from my current kernel and update once the new 
kernel is running.

> Please attach a copy of the config to
> http://bugzilla.kernel.org/show_bug.cgi?id=12201

Done: http://bugzilla.kernel.org/attachment.cgi?id=19263

> Please get sysrq working then get us a task trace, so we can see who is
> sleeping where.  Do this:

Done: http://bugzilla.kernel.org/attachment.cgi?id=19264

I wanted to get the call to /sbin/modprobe in there as well, so I had to 
do it about 10 times before I could catch it.  I also stopped 
unimportant processes on that box to eliminate additional variables.

There is one other variable I didn't mention.  The machine is diskless 
and running on an nfs root.  I have tried to eliminate this variable via 
simple testing (just running /sbin/modprobe at the commandline returns 
instantly), but wanted to mention it.

The box was 100% idle before starting the pilot program and running the 
task trace.  I added gettimeofday() around my socket() call and got 
about 1290000us - 1490000us on average.  Using a socket() call with 
protocol NETLINK_ROUTE returns in about 5us - 8us.  Certain other 
protocol types like NETLINK_FIREWALL also have the same long delay.


Michael Spiegle
mike@nauticaltech.com
Comment 9 Michael Spiegle 2008-12-12 11:47:39 UTC
> If you replace /sbin/modprobe in the kernel module loader, does the
> delay go away:
>   echo /bin/true > /proc/sys/kernel/modprobe

Yes, that does reduce the delay significantly.  The timing of the 
socket() call went from about 1500000us to 430000us.  This points the 
finger at modprobe it seems?

Michael Spiegle
mike@nauticaltech.com
Comment 10 Michael Spiegle 2008-12-12 11:52:36 UTC
> Btw, does this ugly hack fix the problem?
> According to your description it may be the case.
> 
> diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
> index 480184a..2b64eb5 100644
> --- a/net/netlink/af_netlink.c
> +++ b/net/netlink/af_netlink.c
> @@ -434,6 +434,10 @@ static int netlink_create(struct net *net, struct socket
> *sock, int protocol)
>       if (protocol < 0 || protocol >= MAX_LINKS)
>               return -EPROTONOSUPPORT;
>  
> +#ifndef CONFIG_AUDIT
> +     if (protocol == NETLINK_AUDIT)
> +             return -EPROTONOSUPPORT;
> +#endif
>       netlink_lock_table();
>  #ifdef CONFIG_MODULES
>       if (!nl_table[protocol].registered) {
> 
> 

Yes, that would take care of it.  My original workaround was to disable 
CONFIG_KMOD.  I only have a few modules compiled (for servers that 
require those modules), but most of my servers don't need any modules. 
This would allow me to manually load modules myself and never have to 
worry about the kernel forking stuff in userspace.  Unfortunately, it 
appears that CONFIG_KMOD is being deprecated soon and I didn't want to 
rely on it.

What is the core reason for removing CONFIG_KMOD?  I personally would 
rather have the choice as to whether my kernel tries to find its own 
modules or not.

Michael Spiegle
mike@nauticaltech.com