Immediately after boot, extreme load average numbers and over 2000 kworker processes are being observed on my main linux test computer (basically a Ubuntu 16.04 server, no GUI). The worker threads appear to be idle, and do disappear after the nominal 5 minute timeout, depending on whatever other stuff might run in the meantime. However, the number of threads can hugely increase again. The issue occurs with ease for kernels compiled using SLAB. For SLAB, kernel bisection gave: 801faf0db8947e01877920e848a4d338dd7a99e7 "mm/slab: lockless decision to grow cache" The following monitoring script was used for the below examples: #!/bin/dash while [ 1 ]; do echo $(uptime) ::: $(ps -A --no-headers | wc -l) ::: $(ps aux | grep kworker | grep -v u | grep -v H | wc -l) sleep 10.0 done Example (SLAB): After boot: 22:26:21 up 1 min, 2 users, load average: 295.98, 85.67, 29.47 ::: 2240 ::: 2074 22:26:31 up 1 min, 2 users, load average: 250.47, 82.85, 29.15 ::: 2240 ::: 2074 22:26:41 up 1 min, 2 users, load average: 211.96, 80.12, 28.84 ::: 2240 ::: 2074 ... 22:52:34 up 27 min, 3 users, load average: 0.00, 0.43, 5.40 ::: 165 ::: 17 22:52:44 up 27 min, 3 users, load average: 0.00, 0.42, 5.34 ::: 165 ::: 17 Now type: sudo echo "bla": 22:53:14 up 27 min, 3 users, load average: 0.00, 0.38, 5.17 ::: 493 ::: 345 22:53:24 up 28 min, 3 users, load average: 0.00, 0.36, 5.11 ::: 493 ::: 345 Caused 328 new kworker threads. Now queue just a few (8 in this case) very simple jobs. 22:55:45 up 30 min, 3 users, load average: 0.11, 0.27, 4.38 ::: 493 ::: 345 22:55:55 up 30 min, 3 users, load average: 0.09, 0.26, 4.34 ::: 2207 ::: 2059 22:56:05 up 30 min, 3 users, load average: 0.08, 0.25, 4.29 ::: 2207 ::: 2059 If I look at linux/Documentation/workqueue.txt and do: echo workqueue:workqueue_queue_work > /sys/kernel/debug/tracing/set_event and: cat /sys/kernel/debug/tracing/trace_pipe > out.txt I get somewhere between 10,000 and 20,000 occurrences of memcg_kmem_cache_create_func in the file (using my simple test method). Also tested with kernel 4.8-rc7.
(switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Tue, 27 Sep 2016 17:57:08 +0000 bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=172981 > > Bug ID: 172981 > Summary: [bisected] SLAB: extreme load averages and over 2000 > kworker threads > Product: Memory Management > Version: 2.5 > Kernel Version: 4.7+ > Hardware: All > OS: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: Slab Allocator > Assignee: akpm@linux-foundation.org > Reporter: dsmythies@telus.net > Regression: No > > Immediately after boot, extreme load average numbers and over 2000 kworker > processes are being observed on my main linux test computer (basically a > Ubuntu > 16.04 server, no GUI). The worker threads appear to be idle, and do disappear > after the nominal 5 minute timeout, depending on whatever other stuff might > run > in the meantime. However, the number of threads can hugely increase again. > The > issue occurs with ease for kernels compiled using SLAB. > > For SLAB, kernel bisection gave: > 801faf0db8947e01877920e848a4d338dd7a99e7 > "mm/slab: lockless decision to grow cache" > > The following monitoring script was used for the below examples: > > #!/bin/dash > > while [ 1 ]; > do > echo $(uptime) ::: $(ps -A --no-headers | wc -l) ::: $(ps aux | grep > kworker > | grep -v u | grep -v H | wc -l) > sleep 10.0 > done > > Example (SLAB): > > After boot: > > 22:26:21 up 1 min, 2 users, load average: 295.98, 85.67, 29.47 ::: 2240 ::: > 2074 > 22:26:31 up 1 min, 2 users, load average: 250.47, 82.85, 29.15 ::: 2240 ::: > 2074 > 22:26:41 up 1 min, 2 users, load average: 211.96, 80.12, 28.84 ::: 2240 ::: > 2074 > ... > 22:52:34 up 27 min, 3 users, load average: 0.00, 0.43, 5.40 ::: 165 ::: 17 > 22:52:44 up 27 min, 3 users, load average: 0.00, 0.42, 5.34 ::: 165 ::: 17 > > Now type: sudo echo "bla": > > 22:53:14 up 27 min, 3 users, load average: 0.00, 0.38, 5.17 ::: 493 ::: 345 > 22:53:24 up 28 min, 3 users, load average: 0.00, 0.36, 5.11 ::: 493 ::: 345 > > Caused 328 new kworker threads. > Now queue just a few (8 in this case) very simple jobs. > > 22:55:45 up 30 min, 3 users, load average: 0.11, 0.27, 4.38 ::: 493 ::: 345 > 22:55:55 up 30 min, 3 users, load average: 0.09, 0.26, 4.34 ::: 2207 ::: 2059 > 22:56:05 up 30 min, 3 users, load average: 0.08, 0.25, 4.29 ::: 2207 ::: 2059 > > If I look at linux/Documentation/workqueue.txt and do: > > echo workqueue:workqueue_queue_work > /sys/kernel/debug/tracing/set_event > > and: > > cat /sys/kernel/debug/tracing/trace_pipe > out.txt > > I get somewhere between 10,000 and 20,000 occurrences of > memcg_kmem_cache_create_func in the file (using my simple test method). > > Also tested with kernel 4.8-rc7. > > -- > You are receiving this mail because: > You are the assignee for the bug.
Created attachment 239831 [details] Just a very simple script used to create many kworker processes
[CC Vladimir] These are the delayed memcg cache allocations, where in a fresh memcg that doesn't have per-memcg caches yet, every accounted allocation schedules a kmalloc work item in __memcg_schedule_kmem_cache_create() until the cache is finally available. It looks like those can be many more than the number of slab caches in existence, if there is a storm of slab allocations before the workers get a chance to run. Vladimir, what do you think of embedding the work item into the memcg_cache_array? That way we make sure we have exactly one work per cache and not an unbounded number of them. The downside of course is that we'd have to keep these things around as long as the memcg is in existence, but that's the only place I can think of that allows us to serialize this. On Tue, Sep 27, 2016 at 11:10:59AM -0700, Andrew Morton wrote: > > (switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). > > On Tue, 27 Sep 2016 17:57:08 +0000 bugzilla-daemon@bugzilla.kernel.org wrote: > > > https://bugzilla.kernel.org/show_bug.cgi?id=172981 > > > > Bug ID: 172981 > > Summary: [bisected] SLAB: extreme load averages and over 2000 > > kworker threads > > Product: Memory Management > > Version: 2.5 > > Kernel Version: 4.7+ > > Hardware: All > > OS: Linux > > Tree: Mainline > > Status: NEW > > Severity: normal > > Priority: P1 > > Component: Slab Allocator > > Assignee: akpm@linux-foundation.org > > Reporter: dsmythies@telus.net > > Regression: No > > > > Immediately after boot, extreme load average numbers and over 2000 kworker > > processes are being observed on my main linux test computer (basically a > Ubuntu > > 16.04 server, no GUI). The worker threads appear to be idle, and do > disappear > > after the nominal 5 minute timeout, depending on whatever other stuff might > run > > in the meantime. However, the number of threads can hugely increase again. > The > > issue occurs with ease for kernels compiled using SLAB. > > > > For SLAB, kernel bisection gave: > > 801faf0db8947e01877920e848a4d338dd7a99e7 > > "mm/slab: lockless decision to grow cache" > > > > The following monitoring script was used for the below examples: > > > > #!/bin/dash > > > > while [ 1 ]; > > do > > echo $(uptime) ::: $(ps -A --no-headers | wc -l) ::: $(ps aux | grep > kworker > > | grep -v u | grep -v H | wc -l) > > sleep 10.0 > > done > > > > Example (SLAB): > > > > After boot: > > > > 22:26:21 up 1 min, 2 users, load average: 295.98, 85.67, 29.47 ::: 2240 ::: > > 2074 > > 22:26:31 up 1 min, 2 users, load average: 250.47, 82.85, 29.15 ::: 2240 ::: > > 2074 > > 22:26:41 up 1 min, 2 users, load average: 211.96, 80.12, 28.84 ::: 2240 ::: > > 2074 > > ... > > 22:52:34 up 27 min, 3 users, load average: 0.00, 0.43, 5.40 ::: 165 ::: 17 > > 22:52:44 up 27 min, 3 users, load average: 0.00, 0.42, 5.34 ::: 165 ::: 17 > > > > Now type: sudo echo "bla": > > > > 22:53:14 up 27 min, 3 users, load average: 0.00, 0.38, 5.17 ::: 493 ::: 345 > > 22:53:24 up 28 min, 3 users, load average: 0.00, 0.36, 5.11 ::: 493 ::: 345 > > > > Caused 328 new kworker threads. > > Now queue just a few (8 in this case) very simple jobs. > > > > 22:55:45 up 30 min, 3 users, load average: 0.11, 0.27, 4.38 ::: 493 ::: 345 > > 22:55:55 up 30 min, 3 users, load average: 0.09, 0.26, 4.34 ::: 2207 ::: > 2059 > > 22:56:05 up 30 min, 3 users, load average: 0.08, 0.25, 4.29 ::: 2207 ::: > 2059 > > > > If I look at linux/Documentation/workqueue.txt and do: > > > > echo workqueue:workqueue_queue_work > /sys/kernel/debug/tracing/set_event > > > > and: > > > > cat /sys/kernel/debug/tracing/trace_pipe > out.txt > > > > I get somewhere between 10,000 and 20,000 occurrences of > > memcg_kmem_cache_create_func in the file (using my simple test method). > > > > Also tested with kernel 4.8-rc7. > > > > -- > > You are receiving this mail because: > > You are the assignee for the bug. > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
By the way, I can eliminate the problem by doing this: (see also: https://bugzilla.kernel.org/show_bug.cgi?id=172991) diff --git a/mm/slab.c b/mm/slab.c index b672710..a4edbfa 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -965,7 +965,7 @@ static int setup_kmem_cache_node(struct kmem_cache *cachep, * freed after synchronize_sched(). */ if (force_change) - synchronize_sched(); + kick_all_cpus_sync(); fail: kfree(old_shared);
On Tue, Sep 27, 2016 at 08:13:58PM -0700, Doug Smythies wrote: > By the way, I can eliminate the problem by doing this: > (see also: https://bugzilla.kernel.org/show_bug.cgi?id=172991) I think that Johannes found the root cause of the problem and they (Johannes and Vladimir) will solve the root cause. However, there is something useful to do in SLAB side. Could you test following patch, please? Thanks. ---------->8-------------- diff --git a/mm/slab.c b/mm/slab.c index 0eb6691..39e3bf2 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -965,7 +965,7 @@ static int setup_kmem_cache_node(struct kmem_cache *cachep, * guaranteed to be valid until irq is re-enabled, because it will be * freed after synchronize_sched(). */ - if (force_change) + if (n->shared && force_change) synchronize_sched(); fail:
On Wed, Sep 28, 2016 at 02:18:42PM +0900, Joonsoo Kim wrote: > On Tue, Sep 27, 2016 at 08:13:58PM -0700, Doug Smythies wrote: > > By the way, I can eliminate the problem by doing this: > > (see also: https://bugzilla.kernel.org/show_bug.cgi?id=172991) > > I think that Johannes found the root cause of the problem and they > (Johannes and Vladimir) will solve the root cause. > > However, there is something useful to do in SLAB side. > Could you test following patch, please? > > Thanks. > > ---------->8-------------- > diff --git a/mm/slab.c b/mm/slab.c > index 0eb6691..39e3bf2 100644 > --- a/mm/slab.c > +++ b/mm/slab.c > @@ -965,7 +965,7 @@ static int setup_kmem_cache_node(struct kmem_cache > *cachep, > * guaranteed to be valid until irq is re-enabled, because it will be > * freed after synchronize_sched(). > */ > - if (force_change) > + if (n->shared && force_change) > synchronize_sched(); Oops... s/n->shared/old_shared/ Thanks.
On 2016.09.27 23:20 Joonsoo Kim wrote: > On Wed, Sep 28, 2016 at 02:18:42PM +0900, Joonsoo Kim wrote: >> On Tue, Sep 27, 2016 at 08:13:58PM -0700, Doug Smythies wrote: >>> By the way, I can eliminate the problem by doing this: >>> (see also: https://bugzilla.kernel.org/show_bug.cgi?id=172991) >> >> I think that Johannes found the root cause of the problem and they >> (Johannes and Vladimir) will solve the root cause. >> >> However, there is something useful to do in SLAB side. >> Could you test following patch, please? >> >> Thanks. >> >> ---------->8-------------- >> diff --git a/mm/slab.c b/mm/slab.c >> index 0eb6691..39e3bf2 100644 >> --- a/mm/slab.c >> +++ b/mm/slab.c >> @@ -965,7 +965,7 @@ static int setup_kmem_cache_node(struct kmem_cache >> *cachep, >> * guaranteed to be valid until irq is re-enabled, because it will >> be >> * freed after synchronize_sched(). >> */ >> - if (force_change) >> + if (n->shared && force_change) >> synchronize_sched(); > > Oops... > > s/n->shared/old_shared/ Yes, that seems to work fine. After boot everything is good. Then I tried and tried to get it to mess up, but could not.
On Wed, Sep 28, 2016 at 08:22:24AM -0700, Doug Smythies wrote: > On 2016.09.27 23:20 Joonsoo Kim wrote: > > On Wed, Sep 28, 2016 at 02:18:42PM +0900, Joonsoo Kim wrote: > >> On Tue, Sep 27, 2016 at 08:13:58PM -0700, Doug Smythies wrote: > >>> By the way, I can eliminate the problem by doing this: > >>> (see also: https://bugzilla.kernel.org/show_bug.cgi?id=172991) > >> > >> I think that Johannes found the root cause of the problem and they > >> (Johannes and Vladimir) will solve the root cause. > >> > >> However, there is something useful to do in SLAB side. > >> Could you test following patch, please? > >> > >> Thanks. > >> > >> ---------->8-------------- > >> diff --git a/mm/slab.c b/mm/slab.c > >> index 0eb6691..39e3bf2 100644 > >> --- a/mm/slab.c > >> +++ b/mm/slab.c > >> @@ -965,7 +965,7 @@ static int setup_kmem_cache_node(struct kmem_cache > *cachep, > >> * guaranteed to be valid until irq is re-enabled, because it will > be > >> * freed after synchronize_sched(). > >> */ > >> - if (force_change) > >> + if (n->shared && force_change) > >> synchronize_sched(); > > > > Oops... > > > > s/n->shared/old_shared/ > > Yes, that seems to work fine. After boot everything is good. > Then I tried and tried to get it to mess up, but could not. Thanks for confirm. I will send a formal patch, soon. Thanks.
On Wed, Sep 28, 2016 at 11:09:53AM +0300, Vladimir Davydov wrote: > On Tue, Sep 27, 2016 at 10:03:47PM -0400, Johannes Weiner wrote: > > [CC Vladimir] > > > > These are the delayed memcg cache allocations, where in a fresh memcg > > that doesn't have per-memcg caches yet, every accounted allocation > > schedules a kmalloc work item in __memcg_schedule_kmem_cache_create() > > until the cache is finally available. It looks like those can be many > > more than the number of slab caches in existence, if there is a storm > > of slab allocations before the workers get a chance to run. > > > > Vladimir, what do you think of embedding the work item into the > > memcg_cache_array? That way we make sure we have exactly one work per > > cache and not an unbounded number of them. The downside of course is > > that we'd have to keep these things around as long as the memcg is in > > existence, but that's the only place I can think of that allows us to > > serialize this. > > We could set the entry of the root_cache->memcg_params.memcg_caches > array corresponding to the cache being created to a special value, say > (void*)1, and skip scheduling cache creation work on kmalloc if the > caller sees it. I'm not sure it's really worth it though, because > work_struct isn't that big (at least, in comparison with the cache > itself) to avoid embedding it at all costs. Hello, Johannes and Vladimir. I'm not familiar with memcg so have a question about this solution. This solution will solve the current issue but if burst memcg creation happens, similar issue would happen again. My understanding is correct? I think that the other cause of the problem is that we call synchronize_sched() which is rather slow with holding a slab_mutex and it blocks further kmem_cache creation. Should we fix that, too? Thanks.
On Thu, Sep 29, 2016 at 04:45:50PM +0300, Vladimir Davydov wrote: > On Thu, Sep 29, 2016 at 11:00:50AM +0900, Joonsoo Kim wrote: > > On Wed, Sep 28, 2016 at 11:09:53AM +0300, Vladimir Davydov wrote: > > > On Tue, Sep 27, 2016 at 10:03:47PM -0400, Johannes Weiner wrote: > > > > [CC Vladimir] > > > > > > > > These are the delayed memcg cache allocations, where in a fresh memcg > > > > that doesn't have per-memcg caches yet, every accounted allocation > > > > schedules a kmalloc work item in __memcg_schedule_kmem_cache_create() > > > > until the cache is finally available. It looks like those can be many > > > > more than the number of slab caches in existence, if there is a storm > > > > of slab allocations before the workers get a chance to run. > > > > > > > > Vladimir, what do you think of embedding the work item into the > > > > memcg_cache_array? That way we make sure we have exactly one work per > > > > cache and not an unbounded number of them. The downside of course is > > > > that we'd have to keep these things around as long as the memcg is in > > > > existence, but that's the only place I can think of that allows us to > > > > serialize this. > > > > > > We could set the entry of the root_cache->memcg_params.memcg_caches > > > array corresponding to the cache being created to a special value, say > > > (void*)1, and skip scheduling cache creation work on kmalloc if the > > > caller sees it. I'm not sure it's really worth it though, because > > > work_struct isn't that big (at least, in comparison with the cache > > > itself) to avoid embedding it at all costs. > > > > Hello, Johannes and Vladimir. > > > > I'm not familiar with memcg so have a question about this solution. > > This solution will solve the current issue but if burst memcg creation > > happens, similar issue would happen again. My understanding is correct? > > Yes, I think you're right - embedding the work_struct responsible for > cache creation in kmem_cache struct won't help if a thousand of > different cgroups call kmem_cache_alloc() simultaneously for a cache > they haven't used yet. > > Come to think of it, we could fix the issue by simply introducing a > special single-threaded workqueue used exclusively for cache creation > works - cache creation is done mostly under the slab_mutex, anyway. This > way, we wouldn't have to keep those used-once work_structs for the whole > kmem_cache life time. > > > > > I think that the other cause of the problem is that we call > > synchronize_sched() which is rather slow with holding a slab_mutex and > > it blocks further kmem_cache creation. Should we fix that, too? > > Well, the patch you posted looks pretty obvious and it helps the > reporter, so personally I don't see any reason for not applying it. Oops... I forgot to mention why I asked that. There is another report that similar problem also happens in SLUB. In there, synchronize_sched() is called in cache shrinking path with holding the slab_mutex. I guess that it blocks further kmem_cache creation. If we uses special single-threaded workqueue, number of kworker would be limited but kmem_cache creation will be delayed for a long time in burst memcg creation/destroy scenario. https://bugzilla.kernel.org/show_bug.cgi?id=172991 Do we need to remove synchronize_sched() in SLUB and find other solution? Thanks.
On 2016.09.30 12:59 Vladimir Davydov wrote: > Yeah, you're right. We'd better do something about this > synchronize_sched(). I think moving it out of the slab_mutex and calling > it once for all caches in memcg_deactivate_kmem_caches() would resolve > the issue. I'll post the patches tomorrow. Would someone please be kind enough to send me the patch set? I didn't get them, and would like to test them. I have searched and searched and did manage to find: "[PATCH 2/2] slub: move synchronize_sched out of slab_mutex on shrink" And a thread about a patch 1 of 2: "Re: [PATCH 1/2] mm: memcontrol: use special workqueue for creating per-memcg caches" Where I see me as "reported by", but I guess "reported by" people don't get the e-mails. I haven't found PATCH 0/2, nor do I know if what I did find is current. ... Doug
On Wed, Oct 05, 2016 at 10:04:27PM -0700, Doug Smythies wrote: > On 2016.09.30 12:59 Vladimir Davydov wrote: > > > Yeah, you're right. We'd better do something about this > > synchronize_sched(). I think moving it out of the slab_mutex and calling > > it once for all caches in memcg_deactivate_kmem_caches() would resolve > > the issue. I'll post the patches tomorrow. > > Would someone please be kind enough to send me the patch set? > > I didn't get them, and would like to test them. > I have searched and searched and did manage to find: > "[PATCH 2/2] slub: move synchronize_sched out of slab_mutex on shrink" > And a thread about a patch 1 of 2: > "Re: [PATCH 1/2] mm: memcontrol: use special workqueue for creating per-memcg > caches" > Where I see me as "reported by", but I guess "reported by" people don't get > the e-mails. > I haven't found PATCH 0/2, nor do I know if what I did find is current. I think that what you find is correct one. It has no cover-letter so there is no [PATCH 0/2]. Anyway, to clarify, I add links to these patches. https://patchwork.kernel.org/patch/9361853 https://patchwork.kernel.org/patch/9359271 It would be very helpful if you test these patches. Thanks.
On 2016.10.05 23:35 Joonsoo Kim wrote: > On Wed, Oct 05, 2016 at 10:04:27PM -0700, Doug Smythies wrote: >> On 2016.09.30 12:59 Vladimir Davydov wrote: >> >>> Yeah, you're right. We'd better do something about this >>> synchronize_sched(). I think moving it out of the slab_mutex and calling >>> it once for all caches in memcg_deactivate_kmem_caches() would resolve >>> the issue. I'll post the patches tomorrow. >> >> Would someone please be kind enough to send me the patch set? >> >> I didn't get them, and would like to test them. >> I have searched and searched and did manage to find: >> "[PATCH 2/2] slub: move synchronize_sched out of slab_mutex on shrink" >> And a thread about a patch 1 of 2: >> "Re: [PATCH 1/2] mm: memcontrol: use special workqueue for creating >> per-memcg caches" >> Where I see me as "reported by", but I guess "reported by" people don't get >> the e-mails. >> I haven't found PATCH 0/2, nor do I know if what I did find is current. > > I think that what you find is correct one. It has no cover-letter so > there is no [PATCH 0/2]. Anyway, to clarify, I add links to these > patches. > > https://patchwork.kernel.org/patch/9361853 > https://patchwork.kernel.org/patch/9359271 > > It would be very helpful if you test these patches. Yes, as best as I am able to test, the 2 patch set solves both this SLAB and the other SLUB bug reports.
On 2016.10.06 09:02 Doug Smythies wrote: > On 2016.10.05 23:35 Joonsoo Kim wrote: >> On Wed, Oct 05, 2016 at 10:04:27PM -0700, Doug Smythies wrote: >>> On 2016.09.30 12:59 Vladimir Davydov wrote: >>> >>>> Yeah, you're right. We'd better do something about this >>>> synchronize_sched(). I think moving it out of the slab_mutex and calling >>>> it once for all caches in memcg_deactivate_kmem_caches() would resolve >>>> the issue. I'll post the patches tomorrow. >>> >>> Would someone please be kind enough to send me the patch set? >>> >>> I didn't get them, and would like to test them. >>> I have searched and searched and did manage to find: >>> "[PATCH 2/2] slub: move synchronize_sched out of slab_mutex on shrink" >>> And a thread about a patch 1 of 2: >>> "Re: [PATCH 1/2] mm: memcontrol: use special workqueue for creating >>> per-memcg caches" >>> Where I see me as "reported by", but I guess "reported by" people don't get >>> the e-mails. >>> I haven't found PATCH 0/2, nor do I know if what I did find is current. >> >> I think that what you find is correct one. It has no cover-letter so >> there is no [PATCH 0/2]. Anyway, to clarify, I add links to these >> patches. >> >> https://patchwork.kernel.org/patch/9361853 >> https://patchwork.kernel.org/patch/9359271 >> >> It would be very helpful if you test these patches. > > Yes, as best as I am able to test, the 2 patch set > solves both this SLAB and the other SLUB bug reports. I tested the patch from the other thread on top of these two, And things continued to work fine. The additional patch does seems a little faster under some of my hammering conditions. Reference: https://marc.info/?l=linux-kernel&m=147573486705407&w=2
I've seen these issues (2000 kworker threads) on memcg enabled 4.8.5 and 4.8.6 kernels in two situations: 1) directly after boot, probably triggered by either systemd itself or libvirt kvm machine startups. 2000 kworkers and load up to 30, load stabilizes almost immediately and the 2000 kworkers are gone after some minutes without reappearing 2) when using systemd-nspawn to fire up a small container with systemd inside. Here the issue happens when I log in to the container. Same symptoms. Another login after waiting for the workers to die, once more creates 2000 of them for a while and pushes load. After applying the three patches linked to in Doug Smythies' previous comment, against mainline 4.8.6 (SLAB), I can no longer reproduce the issue. I'm running such a patched 4.8.6 on one of my production boxes now (for all of two hours...), with pretty intense mysql and apache workloads (inside KVM machines) on it, and so far everything seems quite fine. So, I'd plead for mainline + stable inclusion, and provide: Tested-By: Patrick Schaaf <kernelorg@bof.de>