Bug 209219
Summary: | KSHAKER: scheduling/execution timing perturbations | ||
---|---|---|---|
Product: | Memory Management | Reporter: | Dmitry Vyukov (dvyukov) |
Component: | Sanitizers | Assignee: | MM/Sanitizers virtual assignee (mm_sanitizers) |
Status: | NEW --- | ||
Severity: | enhancement | CC: | a.p.zijlstra, andreyknvl, dvyukov, glider, kasan-dev, melver, paulmck |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | ALL | Subsystem: | |
Regression: | No | Bisected commit-id: |
Description
Dmitry Vyukov
2020-09-10 08:32:39 UTC
We had discussed a potential implementation, specifically "NMI injection" as the means to inject such delays. It's summarized here: https://github.com/google/syzkaller/issues/1891 Having an interface from userspace to inject NMIs that simply add a delay would enable e.g. syzkaller to generate programs that include such injected delays. There may be alternative designs as you propose as well, but using NMIs gives us arbitrary delay-injection points. Right. I am not sure it's possible to inject NMIs anywhere besides qemu/kvm and special dev boards (i.e. not syzbot). But even there it's controlled from outside of the machine, while we want to control this from inside. Even if we expose a special kernel interface inside of the machine, it won't be possible to achieve right granularity. E.g. on a machine with 1 CPU, user-space can't issue the request until the executing kernel code will be preempted for other reasons at an uncontrolable point. And at this point it's already too late to preempt it, it's already preempted. Having this in kernel in cooperative way seems to provide much better portability, precision and effectiveness. Another idea: if the places we would be interested in inserting delays are limited we could use kprobes. I think we one type of systematic testing is feasible as well. Namely, one-factor enumeration like we do for fault injection: delay first point in a syscall, then second, then 3rd and so on until we enumerate all of them. This will require some debugfs interface to arm this per-task and query if the delay was injected or not. Could these UAFs be detected by KCSAN? Maybe we could bundle the two, as KCSAN already instruments the code? > Could these UAFs be detected by KCSAN? KCSAN already instruments kfree() and will detect races between usage and kfree(). But we know that KASAN is still the better tool to detect UAFs, due to quarantine etc. > Maybe we could bundle the two, as KCSAN already instruments the code? KCSAN instruments memory accesses, and I think that's overkill/too fine-grained. From what I gather, we want to insert delays into strategic locations, such as synchronization or special functions, to enumerate interesting schedules. This will require (as suggested by Dmitry) a cooperative approach, inserting delay functions either directly or via means of kprobes etc. The other requirement seems to be, that we want something that could be applied to all sanitizers, not just KCSAN. On a whole, one direction I'm being reminded of is stateless model checking, which can be applied to real code to perturb schedules in a systematic way. One popular paper I'm aware of is the CHESS paper: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/pldi08-FairStatelessModelChecking.pdf Here is a simple patch based on KCOV that does it: https://github.com/dvyukov/linux/commit/3ca715d1f7e1fbd592097149966d9034805e338a It proved to trigger more bugs in local tests. Remaining work: 1. Figure out how to properly check that a task can sleep. Should this check be moved to kernel/sched* code? 2. Abstract away/remove dependency on rdtsc. 3. Abstract away smap code (x86-specific). 4. Remove all hardcoded policy decisions and allow user-space to control them. A large question is if randomization should be done for all tasks, or only for tasks that have KCOV descriptor. I think the most flexible option would be to add an ioctl that allows to enable delays globally or for the given KCOV descriptor, and control all parameters (frequency/scale of delays). Per-KCOV setting should take precedence over the global one. This allows to explore all possible policies (either enable globally, or enable for each KCOV descriptor separately). The ioctl should also allow to set the random seed, this will be useful for snapshot based mode and will remove dependency on rdtsc. |