Bug 211785

Summary: KASAN (hw-tags): production-grade alloc/free stack traces
Product: Memory Management Reporter: Andrey Konovalov (andreyknvl)
Component: SanitizersAssignee: MM/Sanitizers virtual assignee (mm_sanitizers)
Status: NEW ---    
Severity: normal CC: kasan-dev
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: upstream Subsystem:
Regression: No Bisected commit-id:

Description Andrey Konovalov 2021-02-15 19:49:25 UTC
Provide an implementation of alloc/free stack traces collection that's suitable for production use with regards to both memory and performance impact.

The limitations of the current stack traces collection implementation:
- 30% perf impact
- Stack trace handles are stored in redzones, which doubles kmalloc allocation sizes [1]
- STACKDEPOT only saves new stack traces and never deletes obsolete ones

[1] https://bugzilla.kernel.org/show_bug.cgi?id=209821
Comment 1 Andrey Konovalov 2022-10-18 18:59:18 UTC
The planned steps to resolve this are:

1. Speed up stack trace collection (potentially, by using Shadow Call Stack; patches on-hold until steps #2 and #3 are completed).
2. Keep stack trace handles in the stack ring (merged into the mainline [1]).
3. Add a memory-bounded mode to stack depot or provide an alternative memory-bounded stack storage.
4. Potentially, implement stack trace collection sampling to minimize the performance impact.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ca77f290cff1dfa095d71ae16cc7cda8ee6df495
Comment 2 Andrey Konovalov 2023-12-25 17:25:16 UTC
#3 is partially implemented with the "stackdepot: allow evicting stack traces" series (will likely be merged into 6.8). To complete #3, we need to resolve https://bugzilla.kernel.org/show_bug.cgi?id=218314 (allow bounding memory usage via command line) and, optionally, implement the first proposal from https://bugzilla.kernel.org/show_bug.cgi?id=218313 (reduce memory usage for storing stack traces).

Before considering implementing sampling as suggested in #4, we should implement https://bugzilla.kernel.org/show_bug.cgi?id=218312 and measure the performance impact of stack trace collection on MTE-enabled hardware (e.g. Pixel 8). It is unlikely but possible that together with some stack trace collection optimizations as suggested in #1, #4 will not be required.

Another potential idea to consider for #1 is saving stack traces directly into a stack depot slot to avoid an additional memcpy. However, this might be non-trivial and will likely require reworking the locking strategy used by the stack depot code.
Comment 3 Andrey Konovalov 2023-12-25 17:28:32 UTC
> Another potential idea to consider for #1 is saving stack traces directly
> into a stack depot slot to avoid an additional memcpy. However, this might be
> non-trivial and will likely require reworking the locking strategy used by
> the stack depot code.

Thinking more about this: this might not be possible, as we need to collect the stack trace first to calculate its hash to check whether it's already present in the stack depot.