Both stack depot and KASAN stack ring (for the tag-based modes) require taking the read lock much more often than the write lock. Thus, using percpu-rwsem would likely be faster than the rwlock that is used by them right now. We need to implement the irqsave/irqrestore API flavor for percpu-rwsem and switch stack depot and KASAN to percpu-rwsem. Suggested by Marco in https://lore.kernel.org/lkml/CA+fCnZdAUo1CKDK4kiUyR+Fxc_F++CFezanPDVujx3u7fBmw=A@mail.gmail.com/.
(Note that the stack depot only starts using the rwlock with the "stackdepot: allow evicting stack traces" series, which will likely be merged into 6.8.)
It looks like we may not be able to use percpu-rwsem if we can't sleep. From another discussion: > > We need to figure out whether it's possible and how to add a > > irqdisable flavor for percpu-rwsem. I now see that it's a sleeping > > lock, so I don't know whether adding this flavor is possible or makes > > sense. > > Right, if we can't sleep in the paths we want to take the lock then it > might not even be possible to use percpu-rwsem. > > We also need to check whether it actually improves performance: I see > > in [1] that taking a write lock might take "hundreds of milliseconds". > > If taking the write lock is rare, this should not be a problem. > > But I notice that the write lock is taken on _every_ > stack_depot_put(). With more usage of stack_depot_put(), this will > introduce tons of contention. > > The least we can do is to take the read lock first, and only upgrade > to a write lock if the refcount has reached 0 to free the stack.