Bug 216038 - File lock caches accounting performance
Summary: File lock caches accounting performance
Status: NEW
Alias: None
Product: Memory Management
Classification: Unclassified
Component: Slab Allocator (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Andrew Morton
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-05-27 17:11 UTC by Michal Koutný
Modified: 2022-06-03 12:03 UTC (History)
3 users (show)

See Also:
Kernel Version: v5.18
Subsystem:
Regression: No
Bisected commit-id:


Attachments
flamegraph of baseline measurement, 5.18.0-2.g3352b92-default, cgroup=0 (48.20 KB, image/svg+xml)
2022-05-27 17:12 UTC, Michal Koutný
Details
flamegraph of accounted measurement, 5.18.0-202205261821.g76c743f-default, cgroup=0 (57.29 KB, image/svg+xml)
2022-05-27 17:23 UTC, Michal Koutný
Details
the wrapper of will-it-scale test referret to in comment 0 (539 bytes, text/plain)
2022-05-27 17:31 UTC, Michal Koutný
Details
memcontrol: Cache objcg in task_struct (3.40 KB, patch)
2022-06-03 12:03 UTC, Michal Koutný
Details | Diff

Description Michal Koutný 2022-05-27 17:11:29 UTC
The commit 0f12156dff2862ac54235fc72703f18770769042 ("memcg: enable accounting for file lock caches") adds memcg accounting for file lock related caches, however it turned out to cause some performance regression [1] -- synthetic benchmark will-it-scale drops ~34% in performance. Therefore this accounting was reverted [1].
At the same time unaccounted file locks may be abused to evade memcg limits [2].

I'm filing this bug to resolve both the performance issue and missing accounting (or at least to collect and track available info).

I ran the attached test script and I got the following data.

kernel  cgroup          metric          std     rel improvement [%]
k1      0               3.05892e+07     7812.89          -7.22589  
k1      1               2.50059e+07     9194.39         -24.15951
k2      0               3.29717e+07     21999.1           0.00000 (baseline)
k2      1               32943691        13125.1          -0.08495

The metric is taken directly from the will-it-scale lock1 report (higher is better).
k2 = 5.18.0-2.g3352b92-default (openSUSE TW stable kernel)
k1 = 5.18.0-202205261821.g76c743f-default (ditto with revert of 3754707bcc3)

cgroup = 0 means the benchmark runs in root memcg
cgroup = 1 means the benchmark runs in memcg of depth 1 (child of root)

My system had 48 CPUs, the benchmark ran in 24 parallel instances (that's less parallelism than the kernel test robot's report [1], therefore my measured relative drop might be less pronounced).


[1] https://lore.kernel.org/lkml/20210907150757.GE17617@xsang-OptiPlex-9020/
[2] https://github.com/kata-containers/kata-containers/issues/3373
Comment 1 Michal Koutný 2022-05-27 17:12:58 UTC
Created attachment 301062 [details]
flamegraph of baseline measurement, 5.18.0-2.g3352b92-default, cgroup=0
Comment 2 Michal Koutný 2022-05-27 17:23:55 UTC
Created attachment 301063 [details]
flamegraph of accounted measurement, 5.18.0-202205261821.g76c743f-default, cgroup=0

Some quick insights:
- locks_alloc_lock is not inlined in proband (that's weird with just
  SLAB_ACCOUNT flag, it seems a different compiler version sneaked into my
  rebuild)
- get_obj_cgroup_from_current -- adds ~2% of samples, some more time is
  unattributd to new functions
- no obvious contention on global locks
Comment 3 Michal Koutný 2022-05-27 17:31:34 UTC
Created attachment 301064 [details]
the wrapper of will-it-scale test referret to in comment 0
Comment 4 Michal Koutný 2022-06-01 08:48:42 UTC
(In reply to Michal Koutný from comment #2)
> - locks_alloc_lock is not inlined in proband (that's weird with just
>   SLAB_ACCOUNT flag, it seems a different compiler version sneaked into my
>   rebuild)

Fixed baseline with same compiler:
k3      0               32307750         8484.23        0.00000

Disabling kmem accounting on kernel cmdline eliminates the regression as expected
k1      0 nokmem        3.25532e+07     15937.9         0.75972

One more observation, the regression reported in comment 0 with cgroup=0 is
caused by systemd transiently enabling the memory controller (hence unsealing
memcg_kmem_enabled_key), in theory nokmem-like result should be achievable with
patched kernel too.

Curated perf-report entries with the patched kernel and 1-level memcg (overall
locks_alloc_lock goes up +12%, cf -24% of regression):

	children	self
	4.97%		4.66%  [kernel.vmlinux]  [k] mod_objcg_state
	3.58%		1.78%  [kernel.vmlinux]  [k] get_obj_cgroup_from_current
	1.82%		1.54%  [kernel.vmlinux]  [k] obj_cgroup_charge
	1.67%		1.52%  [kernel.vmlinux]  [k] refill_obj_stock
Comment 5 Michal Koutný 2022-06-03 12:03:20 UTC
Created attachment 301095 [details]
memcontrol: Cache objcg in task_struct

Quick idea of caching the objcg pointer in task_struct in order to save its
repeated evaluation.

> k3      0               32307750         8484.23        0.00000  // baseline
> k5      0               3.08505e+07     10538          -4.5105   //
> acct+cache
> k5      1               2.51642e+07     12502.2       -22.111
> k4      0               30653597        22081          -5.1200   // acct
> k4      1               2.49577e+07     37243         -22.75000

The improvement is visible (above noise) but nothing spectacular.


The crucial question (wrt this particular file lock cache), how much this
regression manifests with real workloads (contemplation suggests it's amortized
by many other operations).

Note You need to log in before you can comment on or make changes to this bug.