Latest working kernel version: None Earliest failing kernel version: 2.6.26 (can't test any older) Distribution: CentOS 5 Hardware Environment: Sun x4450, 16-cores, 128GB of RAM Software Environment: CentOS 5 + Apache webserver Problem Description: My problem started with SSH using the audit library, and my kernel not having AUDIT support. During strace, any call to socket(PF_NETLINK, SOCK_RAW, NETLINK_AUDIT) took 1-2 seconds to return. During this time, sys% was high. As I continued to dig deeper (using lots of printks), I found that these delays were caused by the netlink_create() code calling request_module() to find/load a module for AUDIT support which doesn't exist. Continuing to dig, I found that request_module() uses call_usermodehelper() to run /sbin/modprobe to find/load the module. The farthest I got is that after the process is created, we call wait_for_completion() to get the result of that process. This waiting process takes 1-2 seconds. The big problem in troubleshooting here is that this only starts to happen after the server has been online for a while (10 days maybe) and serving lots of traffic. The delay gradually builds up and maxes out at around 2 seconds. If I manually call /sbin/modprobe on the commandline and provide it the same arguments that call_usermodehelper() uses, the command returns instantly 100% of the time (assuming server has been on for a while). If I write a small pilot program that calls socket(PF_NETLINK, SOCK_RAW, NETLINK_AUDIT), it will delay by 1-2 seconds 100% of the time (assuming server has been online for a while). Certain protocol types given to socket() have zero delay (because no module needs to be loaded). Steps to reproduce: Once server has been online for a while, a simple call to socket(PF_NETLINK, SOCK_RAW, NETLINK_AUDIT) shows the problem.
I'd like to add that once the kernel is in this funky state, I can stop all processes/traffic to the box (mpstat returns zeros across the board), and creating this socket still has a delay. The issue does not appear to be related to any sort of load at the time.
Reply-To: akpm@linux-foundation.org (switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Thu, 11 Dec 2008 14:15:21 -0800 (PST) bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=12201 > > Summary: long wait in call_usermodehelper() / queue_work() / > wait_for_completion() > Product: Process Management > Version: 2.5 > KernelVersion: 2.6.26.8 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: Other > AssignedTo: process_other@kernel-bugs.osdl.org > ReportedBy: mike@nauticaltech.com > > > Latest working kernel version: None > Earliest failing kernel version: 2.6.26 (can't test any older) It'd be great if you could test something more recent please. > Distribution: CentOS 5 > Hardware Environment: Sun x4450, 16-cores, 128GB of RAM > Software Environment: CentOS 5 + Apache webserver > Problem Description: > My problem started with SSH using the audit library, and my kernel not having > AUDIT support. During strace, any call to socket(PF_NETLINK, SOCK_RAW, > NETLINK_AUDIT) took 1-2 seconds to return. During this time, sys% was high. Well this is bad. We don't want the kernel calling out to userspace each time you run socket(PF_NETLINK, ...). The performance could be awful. I don't know if this is a net problem, an audit problem or whatever. Probably the offending kernel code simply shouldn't exist if CONFIG_AUDIT=n. Please attach a copy of the config to http://bugzilla.kernel.org/show_bug.cgi?id=12201 > As I continued to dig deeper (using lots of printks), I found that these > delays > were caused by the netlink_create() code calling request_module() to > find/load > a module for AUDIT support which doesn't exist. > > Continuing to dig, I found that request_module() uses call_usermodehelper() > to > run /sbin/modprobe to find/load the module. > > The farthest I got is that after the process is created, we call > wait_for_completion() to get the result of that process. This waiting > process > takes 1-2 seconds. > > The big problem in troubleshooting here is that this only starts to happen > after the server has been online for a while (10 days maybe) and serving lots > of traffic. The delay gradually builds up and maxes out at around 2 seconds. > > If I manually call /sbin/modprobe on the commandline and provide it the same > arguments that call_usermodehelper() uses, the command returns instantly 100% > of the time (assuming server has been on for a while). > > If I write a small pilot program that calls socket(PF_NETLINK, SOCK_RAW, > NETLINK_AUDIT), it will delay by 1-2 seconds 100% of the time (assuming > server > has been online for a while). Certain protocol types given to socket() have > zero delay (because no module needs to be loaded). > > Steps to reproduce: > Once server has been online for a while, a simple call to socket(PF_NETLINK, > SOCK_RAW, NETLINK_AUDIT) shows the problem. OK, weird. Please get sysrq working then get us a task trace, so we can see who is sleeping where. Do this: - run your "small pilot program" - wait one second (so we catch it while it is delaying) - echo t > /proc/sysrq-trigger - dmesg -s 1000000 > foo - send us a copy of foo. (foo will be large, so it would be best to attach it to http://bugzilla.kernel.org/show_bug.cgi?id=12201 then email us the URL) (Using `echo w > /proc/sysrq-trigger' would work too, if we know that the offending processes are stuck in D state. It will produce less output)
Reply-To: zbr@ioremap.net Hi. On Thu, Dec 11, 2008 at 02:37:58PM -0800, Andrew Morton (akpm@linux-foundation.org) wrote: > > If I write a small pilot program that calls socket(PF_NETLINK, SOCK_RAW, > > NETLINK_AUDIT), it will delay by 1-2 seconds 100% of the time (assuming > server > > has been online for a while). Certain protocol types given to socket() > have > > zero delay (because no module needs to be loaded). > > > > Steps to reproduce: > > Once server has been online for a while, a simple call to > socket(PF_NETLINK, > > SOCK_RAW, NETLINK_AUDIT) shows the problem. > > OK, weird. > > Please get sysrq working then get us a task trace, so we can see who is > sleeping where. Do this: > > > - run your "small pilot program" Or send it to us and we will quickly fix the bug :) Thank you!
On Thu, Dec 11, 2008 at 23:37, Andrew Morton <akpm@linux-foundation.org> wrote: >> As I continued to dig deeper (using lots of printks), I found that these >> delays >> were caused by the netlink_create() code calling request_module() to >> find/load >> a module for AUDIT support which doesn't exist. >> >> Continuing to dig, I found that request_module() uses call_usermodehelper() >> to >> run /sbin/modprobe to find/load the module. >> >> The farthest I got is that after the process is created, we call >> wait_for_completion() to get the result of that process. This waiting >> process >> takes 1-2 seconds. >> >> The big problem in troubleshooting here is that this only starts to happen >> after the server has been online for a while (10 days maybe) and serving >> lots >> of traffic. The delay gradually builds up and maxes out at around 2 >> seconds. >> >> If I manually call /sbin/modprobe on the commandline and provide it the same >> arguments that call_usermodehelper() uses, the command returns instantly >> 100% >> of the time (assuming server has been on for a while). >> >> If I write a small pilot program that calls socket(PF_NETLINK, SOCK_RAW, >> NETLINK_AUDIT), it will delay by 1-2 seconds 100% of the time (assuming >> server >> has been online for a while). Certain protocol types given to socket() have >> zero delay (because no module needs to be loaded). >> >> Steps to reproduce: >> Once server has been online for a while, a simple call to socket(PF_NETLINK, >> SOCK_RAW, NETLINK_AUDIT) shows the problem. If you replace /sbin/modprobe in the kernel module loader, does the delay go away: echo /bin/true > /proc/sys/kernel/modprobe ? Kay
Reply-To: zbr@ioremap.net On Fri, Dec 12, 2008 at 01:51:58AM +0300, Evgeniy Polyakov (zbr@ioremap.net) wrote: > > > If I write a small pilot program that calls socket(PF_NETLINK, SOCK_RAW, > > > NETLINK_AUDIT), it will delay by 1-2 seconds 100% of the time (assuming > server > > > has been online for a while). Certain protocol types given to socket() > have > > > zero delay (because no module needs to be loaded). > > > > > > Steps to reproduce: > > > Once server has been online for a while, a simple call to > socket(PF_NETLINK, > > > SOCK_RAW, NETLINK_AUDIT) shows the problem. > > > > OK, weird. > > > > Please get sysrq working then get us a task trace, so we can see who is > > sleeping where. Do this: > > > > > > - run your "small pilot program" > > Or send it to us and we will quickly fix the bug :) > > Thank you! Btw, does this ugly hack fix the problem? According to your description it may be the case. diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index 480184a..2b64eb5 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -434,6 +434,10 @@ static int netlink_create(struct net *net, struct socket *sock, int protocol) if (protocol < 0 || protocol >= MAX_LINKS) return -EPROTONOSUPPORT; +#ifndef CONFIG_AUDIT + if (protocol == NETLINK_AUDIT) + return -EPROTONOSUPPORT; +#endif netlink_lock_table(); #ifdef CONFIG_MODULES if (!nl_table[protocol].registered) {
Created attachment 19263 [details] Kernel config
Created attachment 19264 [details] Task Trace
> It'd be great if you could test something more recent please. Will do. It takes a little bit of time to reproduce, so I'll respond now with information from my current kernel and update once the new kernel is running. > Please attach a copy of the config to > http://bugzilla.kernel.org/show_bug.cgi?id=12201 Done: http://bugzilla.kernel.org/attachment.cgi?id=19263 > Please get sysrq working then get us a task trace, so we can see who is > sleeping where. Do this: Done: http://bugzilla.kernel.org/attachment.cgi?id=19264 I wanted to get the call to /sbin/modprobe in there as well, so I had to do it about 10 times before I could catch it. I also stopped unimportant processes on that box to eliminate additional variables. There is one other variable I didn't mention. The machine is diskless and running on an nfs root. I have tried to eliminate this variable via simple testing (just running /sbin/modprobe at the commandline returns instantly), but wanted to mention it. The box was 100% idle before starting the pilot program and running the task trace. I added gettimeofday() around my socket() call and got about 1290000us - 1490000us on average. Using a socket() call with protocol NETLINK_ROUTE returns in about 5us - 8us. Certain other protocol types like NETLINK_FIREWALL also have the same long delay. Michael Spiegle mike@nauticaltech.com
> If you replace /sbin/modprobe in the kernel module loader, does the > delay go away: > echo /bin/true > /proc/sys/kernel/modprobe Yes, that does reduce the delay significantly. The timing of the socket() call went from about 1500000us to 430000us. This points the finger at modprobe it seems? Michael Spiegle mike@nauticaltech.com
> Btw, does this ugly hack fix the problem? > According to your description it may be the case. > > diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c > index 480184a..2b64eb5 100644 > --- a/net/netlink/af_netlink.c > +++ b/net/netlink/af_netlink.c > @@ -434,6 +434,10 @@ static int netlink_create(struct net *net, struct socket > *sock, int protocol) > if (protocol < 0 || protocol >= MAX_LINKS) > return -EPROTONOSUPPORT; > > +#ifndef CONFIG_AUDIT > + if (protocol == NETLINK_AUDIT) > + return -EPROTONOSUPPORT; > +#endif > netlink_lock_table(); > #ifdef CONFIG_MODULES > if (!nl_table[protocol].registered) { > > Yes, that would take care of it. My original workaround was to disable CONFIG_KMOD. I only have a few modules compiled (for servers that require those modules), but most of my servers don't need any modules. This would allow me to manually load modules myself and never have to worry about the kernel forking stuff in userspace. Unfortunately, it appears that CONFIG_KMOD is being deprecated soon and I didn't want to rely on it. What is the core reason for removing CONFIG_KMOD? I personally would rather have the choice as to whether my kernel tries to find its own modules or not. Michael Spiegle mike@nauticaltech.com