Bug 203497

Summary: KASAN (sw-tags): support stack instrumentation
Product: Memory Management Reporter: Andrey Konovalov (andreyknvl)
Component: SanitizersAssignee: MM/Sanitizers virtual assignee (mm_sanitizers)
Status: RESOLVED CODE_FIX    
Severity: normal CC: dvyukov, kasan-dev, melver, walter-zh.wu
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: upstream Subsystem:
Regression: No Bisected commit-id:

Description Andrey Konovalov 2019-05-03 16:03:25 UTC
Currently it is disabled via -hwasan-instrument-stack=0 by default.
Comment 1 Walter Wu 2020-06-15 06:50:40 UTC
Hi Andrey,

I try to make -hwasan-instrument-stack=1 and KASAN_OUTLINE=y, then it gets many false positive case as you saying. We break down the case which executed kasan_stack_oob(), the following is shown.


a). Modify the following testcase is not oob case to see why it is triggered. 

static noinline void kasan_stack_oob(void)
{
    char stack_array[80];
    char *p = &stack_array[17]; // it should not trigger

    *(volatile char *)p;
    pr_info("out-of-bounds on stack 0x%lx\n",p);
}


b). trigger by kasan_stack_oob+0x48

[   36.096929] ==================================================================
[   36.097867] BUG: KASAN: invalid-access in kasan_stack_oob+0x48/0x94
[   36.098465] Read of size 1 at addr 00ff0000704875b1 by task cat/179
[   36.098996] Pointer tag: [00], memory tag: [04]
[   36.099421]
[   36.099996] CPU: 3 PID: 179 Comm: cat Tainted: G    B             5.6.0-next-20200408-dirty #13
[   36.100617] Hardware name: linux,dummy-virt (DT)
[   36.101049] Call trace:
[   36.101489]  dump_backtrace+0x0/0x260
[   36.101976]  show_stack+0x14/0x1c
[   36.102451]  dump_stack+0xe0/0x150
[   36.102944]  print_address_description+0x8c/0x398
[   36.103485]  __kasan_report+0x14c/0x22c
[   36.103991]  kasan_report+0x3c/0x58
[   36.104470]  check_memory_region+0x98/0xa0
[   36.104981]  __hwasan_load1_noabort+0x18/0x20
[   36.105501]  kasan_stack_oob+0x48/0x94
...
[   36.118039] Memory state around the buggy address:
[   36.118593]  ffff000070487300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   36.119247]  ffff000070487400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   36.119894] >ffff000070487500: 00 00 00 00 00 00 00 00 00 00 04 04 04 04 04 00
[   36.120473]                                                     ^
[   36.121067]  ffff000070487600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   36.121712]  ffff000070487700: 00 00 00 00 00 00 00 00 00 00 34 34 34 f4 74 74
[   36.122271] ==================================================================


c). We think the wrong is "and x20, x10, x9", x20 doesn't get correct tag which will store in shadow memory, x20[56:63] and x1[0:7] should have the same? so the next tag comparing may wrong.

ffff9000104a331c <kasan_stack_oob>:
ffff9000104a331c:       d10203ff        sub     sp, sp, #0x80
ffff9000104a3320:       a9067bfd        stp     x29, x30, [sp, #96]
ffff9000104a3324:       a9074ff4        stp     x20, x19, [sp, #112]
ffff9000104a3328:       910183fd        add     x29, sp, #0x60
ffff9000104a332c:       d000eec8        adrp    x8, ffff90001227d000 <page_wait_table+0x14c0>
ffff9000104a3330:       f944a508        ldr     x8, [x8, #2376]
ffff9000104a3334:       ca5d53a1        eor     x1, x29, x29, lsr #20
ffff9000104a3338:       92ffe009        mov     x9, #0xffffffffffffff           // #72057594037927935
ffff9000104a333c:       910003ea        mov     x10, sp
ffff9000104a3340:       b3481c29        bfi     x9, x1, #56, #8
ffff9000104a3344:       910003e0        mov     x0, sp
ffff9000104a3348:       52800a02        mov     w2, #0x50                       // #80
ffff9000104a334c:       f81f83a8        stur    x8, [x29, #-8]
ffff9000104a3350:       8a090154        and     x20, x10, x9
ffff9000104a3354:       97fb24c7        bl      ffff90001036c670 <__hwasan_tag_memory>
ffff9000104a3358:       91004693        add     x19, x20, #0x11
ffff9000104a335c:       aa1303e0        mov     x0, x19
ffff9000104a3360:       97fb2466        bl      ffff90001036c4f8 <__hwasan_load1_noabort>
Comment 2 Andrey Konovalov 2020-06-15 19:01:16 UTC
AFAIU the issue here is that HWASAN instrumentation doesn't expect the sp register to be tagged, but with KASAN it might happen if the stack is allocated in slab memory.

This patch should fix the issue:

diff --git a/kernel/fork.c b/kernel/fork.c
index 142b23645d82..c9c76a4d1180 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -261,7 +261,7 @@ static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int node)
                                             THREAD_SIZE_ORDER);
 
        if (likely(page)) {
-               tsk->stack = page_address(page);
+               tsk->stack = kasan_reset_tag(page_address(page));
                return tsk->stack;
        }
        return NULL;
@@ -307,6 +307,7 @@ static unsigned long *alloc_thread_stack_node(struct task_struct *tsk,
 {
        unsigned long *stack;
        stack = kmem_cache_alloc_node(thread_stack_cache, THREADINFO_GFP, node);
+       stack = kasan_reset_tag(stack);
        tsk->stack = stack;
        return stack;
 }

However even with this change there's something else that's wrong, I still see the following crash during boot:

==================================================================
BUG: KASAN: invalid-access in start_kernel+0xd0/0x568
Read of size 8 at addr 63ff900012337f90 by task swapper/0
Pointer tag: [63], memory tag: [ff]

CPU: 0 PID: 0 Comm: swapper Not tainted 5.8.0-rc1-15086-gb3a9e3b9622a-dirty #64
Hardware name: linux,dummy-virt (DT)
Call trace:
 dump_backtrace+0x0/0x2e4 arch/arm64/kernel/time.c:51
 show_stack+0x1c/0x28 arch/arm64/kernel/traps.c:142
 __dump_stack lib/dump_stack.c:77
 dump_stack+0xf0/0x16c lib/dump_stack.c:118
 print_address_description+0x7c/0x308 mm/kasan/report.c:383
 __kasan_report mm/kasan/report.c:513
 kasan_report+0x19c/0x26c mm/kasan/report.c:530
 check_memory_region+0xa4/0xac mm/kasan/tags.c:127
 __hwasan_load8_noabort+0x44/0x50 mm/kasan/tags.c:144
 start_kernel+0xd0/0x568 init/main.c:854


Memory state around the buggy address:
 ffff900012337d00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff900012337e00: 00 00 00 00 00 00 00 00 00 00 00 ff 00 00 ff ff
>ffff900012337f00: ff 00 ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                                              ^
 ffff900012338000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ffff900012338100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
==================================================================
Comment 3 Walter Wu 2020-06-16 02:22:25 UTC
If I understand correctly, I originally think this issue can fixed by adding an additional instruction "bfi x20, x1, #56, #8" after this instruction "and x20, x10, x9". After this modification, x20[56:63] can get the x1[0:7].  Its purpose is shown below:

1) x20[56:63] is pointer tag. It will needed by 3).
2) x1[0:7] is random tag, it generate from "eor x1, x29, x29, lsr #20". This tag will store in shadow memory. It will needed by 3).
3) __hwasan_load1_noabort() will check whether x20[56:63] and x1[0:7] are the same.

---
assemble code:

ffff9000104a331c <kasan_stack_oob>:
ffff9000104a331c:       d10203ff        sub     sp, sp, #0x80
ffff9000104a3320:       a9067bfd        stp     x29, x30, [sp, #96]
ffff9000104a3324:       a9074ff4        stp     x20, x19, [sp, #112]
ffff9000104a3328:       910183fd        add     x29, sp, #0x60
ffff9000104a332c:       d000eec8        adrp    x8, ffff90001227d000 <page_wait_table+0x14c0>
ffff9000104a3330:       f944a508        ldr     x8, [x8, #2376]
ffff9000104a3334:       ca5d53a1        eor     x1, x29, x29, lsr #20
ffff9000104a3338:       92ffe009        mov     x9, #0xffffffffffffff           // #72057594037927935
ffff9000104a333c:       910003ea        mov     x10, sp
ffff9000104a3340:       b3481c29        bfi     x9, x1, #56, #8
ffff9000104a3344:       910003e0        mov     x0, sp
ffff9000104a3348:       52800a02        mov     w2, #0x50                       // #80
ffff9000104a334c:       f81f83a8        stur    x8, [x29, #-8]
ffff9000104a3350:       8a090154        and     x20, x10, x9
ffff9000104a3354:       97fb24c7        bl      ffff90001036c670 <__hwasan_tag_memory>
ffff9000104a3358:       91004693        add     x19, x20, #0x11
ffff9000104a335c:       aa1303e0        mov     x0, x19
ffff9000104a3360:       97fb2466        bl      ffff90001036c4f8 <__hwasan_load1_noabort>
Comment 4 Andrey Konovalov 2020-06-16 12:42:54 UTC
This can be fixed by changing instrumentation, but I prefer the solution with untagging the kernel stacks. Userspace HWASAN can't have sp register tagged right now, and I think we should make the kernel do the same.
Comment 5 Walter Wu 2020-06-17 03:12:45 UTC
Agree, It makes sense that kernel and user have the same instrumentation.

I try to trace code and reproduce this issue after only modify untagging kernel stack, I found below code need to untag the sp?

--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2575,7 +2576,7 @@ SYSCALL_DEFINE5(clone, unsigned long, clone_flags, unsigned long, newsp,
                .child_tid      = child_tidptr,
                .parent_tid     = parent_tidptr,
                .exit_signal    = (clone_flags & CSIGNAL),
-               .stack          = newsp,
+               .stack          = kasan_reset_tag(newsp),
                .tls            = tls,
        };


After apply your and my patch, it always trigger the following one report during boot. I will see why it always is triggered.

==================================================================
[    0.002285] BUG: KASAN: invalid-access in format_decode+0x90/0x10fc
[    0.002306] Read of size 8 at addr 74ff900015447a00 by task swapper/0
[    0.002324] Pointer tag: [74], memory tag: [08]
[    0.002347]
[    0.002375] CPU: 0 PID: 0 Comm: swapper Not tainted 5.6.0-next-20200408-dirty #4
[    0.002395] Hardware name: linux,dummy-virt (DT)
[    0.002412] Call trace:
[    0.002430]  dump_backtrace+0x0/0x578
[    0.002447]  show_stack+0x14/0x1c
[    0.002464]  dump_stack+0x188/0x260
[    0.002481]  print_address_description+0x8c/0x398
[    0.002498]  __kasan_report+0x14c/0x1dc
[    0.002515]  kasan_report+0x3c/0x58
[    0.002533]  check_memory_region+0x98/0xa0
[    0.002551]  __hwasan_loadN_noabort+0x14/0x1c
[    0.002568]  format_decode+0x90/0x10fc
[    0.002585]  vsnprintf+0x184/0x31e4
[    0.002602]  vscnprintf+0x80/0xd4
[    0.002619]  vprintk_store+0x98/0x93c
[    0.002636]  vprintk_emit+0x168/0x79c
[    0.002653]  vprintk_default+0x78/0xa8
[    0.002670]  vprintk_func+0x918/0x9a0
[    0.002687]  printk+0xb8/0xf0
[    0.002704]  kasan_init+0x2b8/0x2d8
[    0.002721]  setup_arch+0x460/0xbc8
[    0.002738]  start_kernel+0xe4/0xb88
[    0.002755]
[    0.002771]
[    0.002789] Memory state around the buggy address:
[    0.002807]  ffff900015447800: 00 00 00 ff ff ff ff ff ff ff ff ff ff ff ff ff
[    0.002826]  ffff900015447900: ff ff ff ff ff 08 ff ff ff ff ff ff ff ff ff ff
[    0.002844] >ffff900015447a00: 08 ff ff ff ff ff ff ff e4 e4 ff ff ff ff ff ff
[    0.002861]                    ^
[    0.002879]  ffff900015447b00: ff 14 14 ff ff ff ff ff ff ff ff ff a4 a4 ff ff
[    0.002898]  ffff900015447c00: ff ff ff ff ff d4 d4 ff ff ff ff ff ff 94 94 d4
[    0.002916] ==================================================================
Comment 6 Andrey Konovalov 2020-06-17 12:31:58 UTC
newsp is a userspace pointer, it's incorrect to use it with kasan_reset_tag(). Why do you need to do anything with it anyway?
Comment 7 Walter Wu 2020-06-17 12:36:26 UTC
I forget it is syscall, we should not need modify this sp.
Comment 8 Walter Wu 2020-06-17 13:16:08 UTC
I always see below the report on qemu, the stack variable is 'spec', but it should not have corruption.

==================================================================
[    0.002089] BUG: KASAN: invalid-access in format_decode+0x90/0x10fc
[    0.002110] Read of size 8 at addr 74ff900015447a00 by task swapper/0
[    0.002127] Pointer tag: [74], memory tag: [08]
[    0.002148]
[    0.002175] CPU: 0 PID: 0 Comm: swapper Not tainted 5.6.0-next-20200408-dirty #23
[    0.002193] Hardware name: linux,dummy-virt (DT)
[    0.002209] Call trace:
[    0.002225]  dump_backtrace+0x0/0x578
[    0.002242]  show_stack+0x14/0x1c
[    0.002258]  dump_stack+0x188/0x260
[    0.002274]  print_address_description+0x8c/0x398
[    0.002291]  __kasan_report+0x14c/0x1dc
[    0.002307]  kasan_report+0x3c/0x58
[    0.002323]  check_memory_region+0x98/0xa0
[    0.002339]  __hwasan_loadN_noabort+0x14/0x1c
[    0.002356]  format_decode+0x90/0x10fc
[    0.002372]  vsnprintf+0x184/0x31e4
[    0.002388]  vscnprintf+0x80/0xd4
[    0.002404]  vprintk_store+0x98/0x93c
[    0.002420]  vprintk_emit+0x168/0x79c
[    0.002436]  vprintk_default+0x78/0xa8
[    0.002452]  vprintk_func+0x918/0x9a0
[    0.002468]  printk+0xb8/0xf0
[    0.002484]  kasan_init+0x2b8/0x2d8
[    0.002500]  setup_arch+0x460/0xbc8
[    0.002517]  start_kernel+0xe4/0xb88
[    0.002532]
[    0.002548]
[    0.002564] Memory state around the buggy address:
[    0.002582]  ffff900015447800: 00 00 00 ff ff ff ff ff ff ff ff ff ff ff ff ff
[    0.002599]  ffff900015447900: ff ff ff ff ff 08 ff ff ff ff ff ff ff ff ff ff
[    0.002617] >ffff900015447a00: 08 ff ff ff ff ff ff ff e4 e4 ff ff ff ff ff ff
[    0.002633]                    ^
[    0.002650]  ffff900015447b00: ff 14 14 ff ff ff ff ff ff ff ff ff a4 a4 ff ff
[    0.002668]  ffff900015447c00: ff ff ff ff ff d4 d4 ff ff ff ff ff ff 94 94 d4
[    0.002685] ==================================================================

I try to add below patch then I see that "BUG: KASAN: invalid-access in start_kernel". I am not sure whether you need this information.

--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -2278,7 +2278,7 @@ char *pointer(const char *fmt, char *buf, char *end, void *ptr,
  * @precision: precision of a number
  * @qualifier: qualifier of a number (long, size_t, ...)
  */
-static noinline_for_stack
+static
 int format_decode(const char *fmt, struct printf_spec *spec)
 {
        const char *start = fmt;


My environment:
- clang-r377782d
- linux-next-20200408
Comment 9 Walter Wu 2020-06-18 13:03:44 UTC
==================================================================
[    0.000000] BUG: KASAN: invalid-access in start_kernel+0x718/0xb88
[    0.000000] Read of size 8 at addr 74ff900015447f70 by task swapper/0
[    0.000000] Pointer tag: [74], memory tag: [ff]
[    0.000000]
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.6.0-next-20200408-dirty #4
[    0.000000] Hardware name: linux,dummy-virt (DT)
[    0.000000] Call trace:
[    0.000000]  dump_backtrace+0x0/0x578
[    0.000000]  show_stack+0x14/0x1c
[    0.000000]  dump_stack+0x188/0x260
[    0.000000]  print_address_description+0x8c/0x398
[    0.000000]  __kasan_report+0x14c/0x1dc
[    0.000000]  kasan_report+0x3c/0x58
[    0.000000]  kasan_handler+0x88/0x22c
[    0.000000]  early_brk64+0x1c/0x38
[    0.000000]  do_debug_exception+0x4c4/0x814
[    0.000000]  el1_sync_handler+0x40/0x244
[    0.000000]  el1_sync+0x7c/0x100
[    0.000000]  start_kernel+0x718/0xb88
[    0.000000]
---
It looks like that the KASAN report is triggered by start_kernel(), as I 
remember that process 0 execute start_kernel(), the process 0 should be init_task, the tag of init_task.stack should be 0xff?


below is init_task structure.

struct task_struct init_task
#ifdef CONFIG_ARCH_TASK_STRUCT_ON_STACK
    __init_task_data
#endif
= {
...
    .stack      = init_stack,
Comment 10 Andrey Konovalov 2020-06-19 14:22:25 UTC
No, init_stack is a global and it's not tagged.

The problem here is that in start_kernel() we tag stack variables before kasan_init(), and those tags end up in temporary shadow memory. Later we enable KASAN, switch to normal shadow, and start checking tags, which are missing in normal shadow.

We could disable KASAN instrumentation for start_kernel() (and probably for arm64's setup_arch() even though it doesn't have any stack variables right now).

Dmitry, WDYT?
Comment 11 Andrey Konovalov 2020-06-24 17:46:43 UTC
This won't work, as there's a number of functions that are inlined into start_kernel(), and we'll need to mark those with __no_sanitize_address too. We could disable instrumentation of kernel/init.c, but it seems quite harsh.
Comment 12 Andrey Konovalov 2020-06-24 17:49:04 UTC
Maybe we can only disable stack instrumentation for tag-based mode for those files via something like KASAN_SANITIZE_STACK_init.o := n.
Comment 13 Marco Elver 2020-06-24 17:58:52 UTC
(In reply to Andrey Konovalov from comment #11)
> This won't work, as there's a number of functions that are inlined into
> start_kernel(), and we'll need to mark those with __no_sanitize_address too.
> We could disable instrumentation of kernel/init.c, but it seems quite harsh.

Does tagged ASAN's inlining behaviour differ to ASAN? AFAIK, they don't, and inlined functions still "inherit" whether or not they're instrumented from the parent function. Same for __always_inline functions. So there is, AFAIK, no need to explicitly mark the inlined functions __no_sanitize.
Comment 14 Andrey Konovalov 2020-06-24 18:11:50 UTC
Hm, right, then the issue I'm getting is probably unrelated. Let me debug further. Thanks!
Comment 15 Dmitry Vyukov 2020-06-25 06:23:50 UTC
Walter, please pass the stack through the scripts/decode_stacktrace.sh script to add line numbers. It would be useful to understand what exactly variable causes the report.
In comments 8 and 9 you posted different stacks, which of these do you see and when?

I would try KASAN_SANITIZE_STACK_init.o := n anyway b/c it's just cheap to try.

One thing I would is the following. At the end of kasan_init we do:

	/* At this point kasan is fully initialized. Enable error messages */
	init_task.kasan_depth = 0;

I would try to move it way later, say to beginning of rest_init.
Regardless of result it will give some additional information: is the problem related to start_kernel only? or it affects later execution as well?
Comment 16 Walter Wu 2020-06-25 16:53:23 UTC
(In reply to Dmitry Vyukov from comment #15)
> Walter, please pass the stack through the scripts/decode_stacktrace.sh
> script to add line numbers. It would be useful to understand what exactly
> variable causes the report.
> In comments 8 and 9 you posted different stacks, which of these do you see
> and when?
> 
I can reproduce both reports. Please refer below the KASAN report in Comment 8. The corruption stack variable is "spec", but it should be a false positive, because it is declared by vsnprintf() and should be valid. I see the call stack, this report should be triggered by "pr_info("KernelAddressSanitizer initialized\n");". Unfortunately, it should be first false positive case after finish KASAN initialization.

format_decode() trigger many KASAN reports during booting. But if I remove the noinline_for_stack in format_decode(), I will not see any KASAN reports which trigger by format_decode(), then I only see the report in Comment 9.

==================================================================
[    0.002077] BUG: KASAN: invalid-access in format_decode (outfolder/../lib/vsprintf.c:2288)
[    0.002098] Read of size 8 at addr 74ff900015447a00 by task swapper/0
[    0.002116] Pointer tag: [74], memory tag: [08]
[    0.002138]
[    0.002167] CPU: 0 PID: 0 Comm: swapper Not tainted 5.6.0-next-20200408-dirty #6
[    0.002187] Hardware name: linux,dummy-virt (DT)
[    0.002203] Call trace:
[    0.002220] dump_backtrace (outfolder/../arch/arm64/kernel/traps.c:87)
[    0.002236] show_stack (outfolder/../arch/arm64/kernel/traps.c:143)
[    0.002253] dump_stack (outfolder/../lib/dump_stack.c:121)
[    0.002269] print_address_description (outfolder/../mm/kasan/report.c:383)
[    0.002286] __kasan_report (outfolder/../mm/kasan/report.c:512)
[    0.002303] kasan_report (outfolder/../mm/kasan/common.c:625)
[    0.002319] check_memory_region (outfolder/../mm/kasan/tags.c:128)
[    0.002336] __hwasan_loadN_noabort (outfolder/../mm/kasan/tags.c:151)
[    0.002353] format_decode (outfolder/../lib/vsprintf.c:2288)
[    0.002369] vsnprintf (outfolder/../lib/vsprintf.c:2530)
[    0.002386] vscnprintf (outfolder/../lib/vsprintf.c:2679)
[    0.002411] vprintk_store (outfolder/../kernel/printk/printk.c:1918)
[    0.002429] vprintk_emit (outfolder/../kernel/printk/printk.c:1977)
[    0.002445] vprintk_default (outfolder/../kernel/printk/printk.c:2023)
[    0.002462] vprintk_func (outfolder/../kernel/printk/printk_safe.c:386)
[    0.002478] printk (outfolder/../kernel/printk/printk.c:2057)
[    0.002494] kasan_init (outfolder/../arch/arm64/mm/kasan_init.c:266)
[    0.002510] setup_arch (outfolder/../arch/arm64/kernel/setup.c:338)
[    0.002526] start_kernel (outfolder/../init/main.c:817)
[    0.002543]
[    0.002558]
[    0.002575] Memory state around the buggy address:
[    0.002593]  ffff900015447800: 00 00 00 ff ff ff ff ff ff ff ff ff ff ff ff ff
[    0.002611]  ffff900015447900: ff ff ff ff ff 08 ff ff ff ff ff ff ff ff ff ff
[    0.002628] >ffff900015447a00: 08 ff ff ff ff ff ff ff e4 e4 ff ff ff ff ff ff
[    0.002645]                    ^
[    0.002663]  ffff900015447b00: ff 14 14 ff ff ff ff ff ff ff ff ff a4 a4 ff ff
[    0.002680]  ffff900015447c00: ff ff ff ff ff d4 d4 ff ff ff ff ff ff 94 94 d4
[    0.002698] ==================================================================


==================================================================
[    0.000000] BUG: KASAN: invalid-access in start_kernel (outfolder/../init/main.c:817)
[    0.000000] Read of size 8 at addr 74ff900015447f70 by task swapper/0
[    0.000000] Pointer tag: [74], memory tag: [ff]
[    0.000000]
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.6.0-next-20200408-dirty #5
[    0.000000] Hardware name: linux,dummy-virt (DT)
[    0.000000] Call trace:
[    0.000000] dump_backtrace (outfolder/../arch/arm64/kernel/traps.c:87)
[    0.000000] show_stack (outfolder/../arch/arm64/kernel/traps.c:143)
[    0.000000] dump_stack (outfolder/../lib/dump_stack.c:121)
[    0.000000] print_address_description (outfolder/../mm/kasan/report.c:383)
[    0.000000] __kasan_report (outfolder/../mm/kasan/report.c:512)
[    0.000000] kasan_report (outfolder/../mm/kasan/common.c:625)
[    0.000000] kasan_handler (outfolder/../arch/arm64/kernel/traps.c:1019)
[    0.000000] early_brk64 (outfolder/../arch/arm64/kernel/traps.c:1045)
[    0.000000] do_debug_exception (outfolder/../arch/arm64/mm/fault.c:873)
[    0.000000] el1_sync_handler (outfolder/../arch/arm64/kernel/entry-common.c:98)
[    0.000000] el1_sync (outfolder/../arch/arm64/kernel/entry.S:588)
[    0.000000] start_kernel (outfolder/../init/main.c:817)
[    0.000000]
[    0.000000]
[    0.000000] Memory state around the buggy address:
[    0.000000]  ffff900015447d00: 00 00 00 00 00 00 00 00 00 ff ff ff ff 00 ff ff
[    0.000000]  ffff900015447e00: 00 00 00 00 00 00 00 00 ff 00 ff 00 00 ff ff 00
[    0.000000] >ffff900015447f00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[    0.000000]                                         ^
[    0.000000]  ffff900015448000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[    0.000000]  ffff900015448100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[    0.000000] ==================================================================
Comment 17 Andrey Konovalov 2020-06-25 17:01:39 UTC
I think I fixed those, patches are here: https://github.com/xairy/linux/commits/up-kasan-stack-tags

With these patches and outline instrumentation the kernel without any false-positives for me, but there are still some issues with inline instrumentation.
Comment 18 Walter Wu 2020-06-29 07:38:30 UTC
I found inline instrumentation root cause. I am not sure if there are 
still other issues besides this.

First, I write one testcase which always reproduces false positive case and it is simpler, please refer below information.

struct foo {
    unsigned int    t1:8;
    unsigned int    t2:8;
    unsigned int    t3:8;
    unsigned int    t4:8;
    unsigned int    t5:8;
};

static noinline_for_stack
void fun(struct foo *spec)
{
    spec->t1 = 1; // always trigger kasan report, but it is false positives
}

static noinline
void kasan_stack_oob(void)
{
    struct foo spec = {0};
    fun(&spec);
}

[   10.508952] BUG: KASAN: invalid-access in fun+0x30/0x40
[   10.509222] Write of size 8 at addr 4bff000070b2f520 by task cat/179
[   10.509487] Pointer tag: [4b], memory tag: [08]
[   10.509676]
[   10.509859] CPU: 3 PID: 179 Comm: cat Tainted: G    B             5.6.0-next-20200408-dirty #25
[   10.510175] Hardware name: linux,dummy-virt (DT)
[   10.510362] Call trace:
[   10.510632]  dump_backtrace+0x0/0x578
[   10.510880]  show_stack+0x14/0x1c
[   10.511147]  dump_stack+0x188/0x260
[   10.511353]  print_address_description+0x8c/0x398
[   10.511588]  __kasan_report+0x14c/0x1dc
[   10.511796]  kasan_report+0x3c/0x58
[   10.511997]  check_memory_region+0x98/0xa0
[   10.512214]  __hwasan_storeN_noabort+0x14/0x1c
[   10.512439]  fun+0x30/0x40
[   10.512621]  kasan_stack_oob+0x70/0xcc


Second, see the below assembly code, I found the shadow memory which map the address, it store #8 value, please see that "mov w12, #0x8 // #8". It should store the tag of address? So we see many false positive KASAN reports always show "memory tag: [08]" after finish KASAN initialization. This is kernel or clang bug?

ffff900010d69c7c <kasan_stack_oob>:
ffff900010d69c7c:       d10103ff        sub     sp, sp, #0x40
ffff900010d69c80:       a9027bfd        stp     x29, x30, [sp, #32]
ffff900010d69c84:       a9034ff4        stp     x20, x19, [sp, #48]
ffff900010d69c88:       910083fd        add     x29, sp, #0x20
ffff900010d69c8c:       90023728        adrp    x8, ffff90001544d000 <page_wait_table+0x14c0>
ffff900010d69c90:       f944a508        ldr     x8, [x8, #2376]
ffff900010d69c94:       ca5d53a9        eor     x9, x29, x29, lsr #20
ffff900010d69c98:       92ffe00a        mov     x10, #0xffffffffffffff          // #72057594037927935
ffff900010d69c9c:       910003eb        mov     x11, sp
ffff900010d69ca0:       b3481d2a        bfi     x10, x9, #56, #8
ffff900010d69ca4:       d2d20013        mov     x19, #0x900000000000            // #158329674399744
ffff900010d69ca8:       8a0a0160        and     x0, x11, x10
ffff900010d69cac:       f2fdfff3        movk    x19, #0xefff, lsl #48
ffff900010d69cb0:       5280010c        mov     w12, #0x8                       // #8   ; w12 should be the value of shadow memory
ffff900010d69cb4:       d344fd74        lsr     x20, x11, #4
ffff900010d69cb8:       f81f83a8        stur    x8, [x29, #-8]
ffff900010d69cbc:       b2481c08        orr     x8, x0, #0xff00000000000000
ffff900010d69cc0:       38336a8c        strb    w12, [x20, x19]   ; w12 write into shadow memory
ffff900010d69cc4:       39003fe9        strb    w9, [sp, #15]
ffff900010d69cc8:       d344fd09        lsr     x9, x8, #4
ffff900010d69ccc:       3873692a        ldrb    w10, [x9, x19]
ffff900010d69cd0:       d378fc09        lsr     x9, x0, #56
ffff900010d69cd4:       6b0a013f        cmp     w9, w10
ffff900010d69cd8:       54000060        b.eq    ffff900010d69ce4 <kasan_stack_oob+0x68>  // b.none
ffff900010d69cdc:       7103fd3f        cmp     w9, #0xff
ffff900010d69ce0:       540001a1        b.ne    ffff900010d69d14 <kasan_stack_oob+0x98>  // b.any
ffff900010d69ce4:       f900001f        str     xzr, [x0]
ffff900010d69ce8:       94000018        bl      ffff900010d69d48 <fun>
...
ffff900010d69d48 <fun>:
ffff900010d69d48:       a9be7bfd        stp     x29, x30, [sp, #-32]!
ffff900010d69d4c:       a9014ff4        stp     x20, x19, [sp, #16]
ffff900010d69d50:       910003fd        mov     x29, sp
ffff900010d69d54:       52800101        mov     w1, #0x8                        // #8
ffff900010d69d58:       aa0003f3        mov     x19, x0
ffff900010d69d5c:       97f0647f        bl      ffff900010982f58 <__hwasan_loadN_noabort>
ffff900010d69d60:       f9400268        ldr     x8, [x19]
ffff900010d69d64:       52800101        mov     w1, #0x8                        // #8
ffff900010d69d68:       aa1303e0        mov     x0, x19
ffff900010d69d6c:       9278dd08        and     x8, x8, #0xffffffffffffff00
ffff900010d69d70:       b2400114        orr     x20, x8, #0x1
ffff900010d69d74:       97f06480        bl      ffff900010982f74 <__hwasan_storeN_noabort>
ffff900010d69d78:       f9000274        str     x20, [x19]
ffff900010d69d7c:       a9414ff4        ldp     x20, x19, [sp, #16]
ffff900010d69d80:       a8c27bfd        ldp     x29, x30, [sp], #32
ffff900010d69d84:       d65f03c0        ret
Comment 19 Andrey Konovalov 2020-06-30 12:32:04 UTC
Hi Walter,

Thanks for narrowing down the issue.

I'll be likely busy this week with other stuff, but I'll take a look on the next one.
Comment 20 Andrey Konovalov 2020-07-30 18:01:10 UTC
OK, finally got to working on this. It looks like the issue is caused by HWASAN short granules [1], which the kernel doesn't account for. Let me look more closely into this.

[1] https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html
Comment 21 Andrey Konovalov 2020-07-30 18:25:29 UTC
Hi Walter,

I've updated the series with fixes here: https://github.com/xairy/linux/commits/up-kasan-stack-tags

Could you try them and see if they fix the false positive with inline instrumentation for you?

Thanks!
Comment 22 Walter Wu 2020-07-31 06:15:43 UTC
Hi Andrey,

Thanks for your help.

It is workable for me. It looks like no false-positive issue, and I think below patch should not need it. because it is fixed by hwasan-use-short-granules=0.
https://github.com/xairy/linux/commit/a48d20d2397d084af3790dfc5ddfc83b788ddd9a

After apply patches, I try to execute KASAN UT to verify them. The stack oob test should need to be fixed. Could you send below patch together?
Thanks.

---
lib/test_kasan.c: fix the stack OOB test for tag-based KASAN

With tag based KASAN, the stack OOB test doesn't trigger out-of-bounds
memory access. This test need to be fixed.

Signed-off-by: Walter Wu <walter-zh.wu@mediatek.com>
--- a/lib/test_kasan.c
+++ b/lib/test_kasan.c
@@ -488,7 +488,7 @@ static noinline void __init kasan_global_oob(void)
 static noinline void __init kasan_stack_oob(void)
 {
        char stack_array[10];
-       volatile int i = 0;
+       volatile int i = OOB_TAG_OFF;
        char *p = &stack_array[ARRAY_SIZE(stack_array) + i];

        pr_info("out-of-bounds on stack\n");
Comment 23 Andrey Konovalov 2020-07-31 13:21:40 UTC
Yeah, we have KASAN_SANITIZE:=n for arm64 kasan_init.c, so that patch isn't needed indeed.

I've mailed the series, thanks!
Comment 24 Andrey Konovalov 2020-08-10 14:14:32 UTC
Basic stack instrumentation support has been merged.

No short granule support for now, filed: https://bugzilla.kernel.org/show_bug.cgi?id=208865