Provide an implementation of alloc/free stack traces collection that's suitable for production use with regards to both memory and performance impact. The limitations of the current stack traces collection implementation: - 30% perf impact - Stack trace handles are stored in redzones, which doubles kmalloc allocation sizes [1] - STACKDEPOT only saves new stack traces and never deletes obsolete ones [1] https://bugzilla.kernel.org/show_bug.cgi?id=209821
The planned steps to resolve this are: 1. Speed up stack trace collection (potentially, by using Shadow Call Stack; patches on-hold until steps #2 and #3 are completed). 2. Keep stack trace handles in the stack ring (merged into the mainline [1]). 3. Add a memory-bounded mode to stack depot or provide an alternative memory-bounded stack storage. 4. Potentially, implement stack trace collection sampling to minimize the performance impact. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ca77f290cff1dfa095d71ae16cc7cda8ee6df495
#3 is partially implemented with the "stackdepot: allow evicting stack traces" series (will likely be merged into 6.8). To complete #3, we need to resolve https://bugzilla.kernel.org/show_bug.cgi?id=218314 (allow bounding memory usage via command line) and, optionally, implement the first proposal from https://bugzilla.kernel.org/show_bug.cgi?id=218313 (reduce memory usage for storing stack traces). Before considering implementing sampling as suggested in #4, we should implement https://bugzilla.kernel.org/show_bug.cgi?id=218312 and measure the performance impact of stack trace collection on MTE-enabled hardware (e.g. Pixel 8). It is unlikely but possible that together with some stack trace collection optimizations as suggested in #1, #4 will not be required. Another potential idea to consider for #1 is saving stack traces directly into a stack depot slot to avoid an additional memcpy. However, this might be non-trivial and will likely require reworking the locking strategy used by the stack depot code.
> Another potential idea to consider for #1 is saving stack traces directly > into a stack depot slot to avoid an additional memcpy. However, this might be > non-trivial and will likely require reworking the locking strategy used by > the stack depot code. Thinking more about this: this might not be possible, as we need to collect the stack trace first to calculate its hash to check whether it's already present in the stack depot.