Bug 216038
Summary: | File lock caches accounting performance | ||
---|---|---|---|
Product: | Memory Management | Reporter: | Michal Koutný (mkoutny) |
Component: | Slab Allocator | Assignee: | Andrew Morton (akpm) |
Status: | NEW --- | ||
Severity: | normal | CC: | guro, mkoutny, shakeelb |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | v5.18 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
flamegraph of baseline measurement, 5.18.0-2.g3352b92-default, cgroup=0
flamegraph of accounted measurement, 5.18.0-202205261821.g76c743f-default, cgroup=0 the wrapper of will-it-scale test referret to in comment 0 memcontrol: Cache objcg in task_struct |
Description
Michal Koutný
2022-05-27 17:11:29 UTC
Created attachment 301062 [details]
flamegraph of baseline measurement, 5.18.0-2.g3352b92-default, cgroup=0
Created attachment 301063 [details]
flamegraph of accounted measurement, 5.18.0-202205261821.g76c743f-default, cgroup=0
Some quick insights:
- locks_alloc_lock is not inlined in proband (that's weird with just
SLAB_ACCOUNT flag, it seems a different compiler version sneaked into my
rebuild)
- get_obj_cgroup_from_current -- adds ~2% of samples, some more time is
unattributd to new functions
- no obvious contention on global locks
Created attachment 301064 [details] the wrapper of will-it-scale test referret to in comment 0 (In reply to Michal Koutný from comment #2) > - locks_alloc_lock is not inlined in proband (that's weird with just > SLAB_ACCOUNT flag, it seems a different compiler version sneaked into my > rebuild) Fixed baseline with same compiler: k3 0 32307750 8484.23 0.00000 Disabling kmem accounting on kernel cmdline eliminates the regression as expected k1 0 nokmem 3.25532e+07 15937.9 0.75972 One more observation, the regression reported in comment 0 with cgroup=0 is caused by systemd transiently enabling the memory controller (hence unsealing memcg_kmem_enabled_key), in theory nokmem-like result should be achievable with patched kernel too. Curated perf-report entries with the patched kernel and 1-level memcg (overall locks_alloc_lock goes up +12%, cf -24% of regression): children self 4.97% 4.66% [kernel.vmlinux] [k] mod_objcg_state 3.58% 1.78% [kernel.vmlinux] [k] get_obj_cgroup_from_current 1.82% 1.54% [kernel.vmlinux] [k] obj_cgroup_charge 1.67% 1.52% [kernel.vmlinux] [k] refill_obj_stock Created attachment 301095 [details] memcontrol: Cache objcg in task_struct Quick idea of caching the objcg pointer in task_struct in order to save its repeated evaluation. > k3 0 32307750 8484.23 0.00000 // baseline > k5 0 3.08505e+07 10538 -4.5105 // > acct+cache > k5 1 2.51642e+07 12502.2 -22.111 > k4 0 30653597 22081 -5.1200 // acct > k4 1 2.49577e+07 37243 -22.75000 The improvement is visible (above noise) but nothing spectacular. The crucial question (wrt this particular file lock cache), how much this regression manifests with real workloads (contemplation suggests it's amortized by many other operations). |