Bug 218628

Summary: Aarch64 inexact watchpoint address make SIGTRAP hard to interpret
Product: Platform Specific/Hardware Reporter: Tom de Vries (vries)
Component: ARMAssignee: linux-arm-kernel (linux-arm-kernel)
Status: NEW ---    
Severity: normal    
Priority: P3    
Hardware: All   
OS: Linux   
Kernel Version: Subsystem:
Regression: No Bisected commit-id:

Description Tom de Vries 2024-03-22 16:21:08 UTC
[ First filed at https://sourceware.org/bugzilla/show_bug.cgi?id=31486 ] 

Consider the following scenario on aarch64:
- we set a hardware watchpoint of size 8 (the maximum size) on some memory
  location p+8.
- we allow execution to happen, and an stp instruction is executed, writing
  and 8-byte register r1 to p and another 8-byte register r2 to p+8.
- the hardware watchpoint triggers.

The trigger address reported by the watchpoint can be p, or p+8.

This depends on the cpu.  I was able to observe both behaviours, the p case on
an Apple M1, and the p+8 case on both a Rockchip RK3399 and a MediaTek MT8183.

In the p+8 case, it's easy to match the trigger address to the watchpoint.

In the p case, less so.  This case is known as an having an "inexact
watchpoint address", and handling was added in the linux kernel by commit
fdfeff0f9e3d ("arm64: hw_breakpoint: Handle inexact watchpoint addresses").

The responsibility of the watchpoint_handler in
arch/arm64/kernel/hw_breakpoint.c is to decide whether to report a SIGTRAP to
the user, and with what trigger address.

The trigger address used is simply what the hardware reported, so in the
scenario above that's either p or p+8.

Now let's see how that plays out in user space, using the executable from the
gdb testcase gdb.base/watchpoint-unaligned.exp.

First, with lldb (version 17.0.6):
...
$ lldb watchpoint-unaligned \
    -o "b main" \
    -o run \
    -o "watch set var data.u.size8twice[1]" \
    -o c \
    -o q
(lldb) target create "watchpoint-unaligned"
Current executable set to '/home/vries/gdb/watchpoint-unaligned' (aarch64).
(lldb) b main
Breakpoint 1: where = watchpoint-unaligned`main + 8 at watchpoint-unaligned.c:65:3, address = 0x0000000000410208
(lldb) run
Process 1077373 stopped
* thread #1, name = 'watchpoint-unal', stop reason = breakpoint 1.1
    frame #0: 0x0000000000410208 watchpoint-unaligned`main at watchpoint-unaligned.c:65:3
   62  
   63     assert (sizeof (data) == 8 + 3 * 8);
   64  
-> 65     write_size8twice ();
   66  
   67     while (size)
   68       {
Process 1077373 launched: '/home/vries/gdb/watchpoint-unaligned' (aarch64)
(lldb) watch set var data.u.size8twice[1]
Watchpoint created: Watchpoint 1: addr = 0x00440048 size = 8 state = enabled type = w
    declare @ '/home/vries/gdb/src/gdb/testsuite/gdb.base/watchpoint-unaligned.c:35:3'
    watchpoint spec = 'data.u.size8twice[1]'
    new value: 0
(lldb) c
Process 1077373 resuming
Process 1077373 stopped
* thread #1, name = 'watchpoint-unal', stop reason = trace
    frame #0: 0x00000000004101f0 watchpoint-unaligned`write_size8twice at watchpoint-unaligned.c:48:3
   45  
   46   #ifdef __aarch64__
   47     volatile void *p = &data.u.size8twice[offset];
-> 48     asm volatile ("stp %1, %2, [%0]"
   49                   : /* output */
   50                   : "r" (p), "r" (first), "r" (second) /* input */
   51                   : "memory" /* clobber */);
(lldb) q
...

So, rather than reporting a watchpoint stop (showing old/new values), lldb
reports a "trace" stop.

Current gdb does better, since commit 9a03f218534 ("[gdb/tdep] Fix
gdb.base/watchpoint-unaligned.exp on aarch64"):
...
$ gdb -q -batch -iex "set trace-commands on" watchpoint-unaligned \
    -ex start \
    -ex "watch data.u.size8twice[1]" \
    -ex c
+start
Temporary breakpoint 1 at 0x410208: file watchpoint-unaligned.c, line 65.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Temporary breakpoint 1, main () at watchpoint-unaligned.c:65
65        write_size8twice ();
+watch data.u.size8twice[1]
Hardware watchpoint 2: data.u.size8twice[1]
+c

Hardware watchpoint 2: data.u.size8twice[1]

Old value = 0
New value = 2
write_size8twice () at watchpoint-unaligned.c:56
56      }
...

This was using a regular watchpoint (observes value changed),
but say we change this to an access watchpoint (observes reads
or writes).

With lldb, the behaviour is similar, but with gdb we run into a hang ( see
https://sourceware.org/bugzilla/show_bug.cgi?id=31486 ).

To understand why this hang happens, consider what happens when gdb receives a
SIGTRAP.  It classifies it as one of:
- hardware watchpoint trigger,
- hardware breakpoint trigger, or
- deleted hardware watchpoint/breakpoint trigger.

In case of a hardware watchpoint trigger, it may be necessary to briefly
disable the watchpoint to be able to continue past the instruction triggering
the watchpoint.

The hang happens because the hardware watchpoint trigger is classified as
"deleted hardware watchpoint/breakpoint trigger".  It is ignored, but when
trying to continue past the instruction, we keep running into the same
watchpoint over and over again.

This can be trivially fixed by the same type of fix as in 9a03f218534: loosening
the matching criteria to allow matches 8 bytes before the watchpoint address.

This however will result in false positives.  In the case of the regular
watchpoints, we filter those out on the basis that the watched values didn't
change.  But that's not possible for access watchpoints.

Note also that the fix in commit 9a03f218534 itself is not ideal.  It contains
a hardcoded value max_access_size, set to 16, which might need updating to
handle cases with SVE instructions, or mops instructions ( see also
https://sourceware.org/bugzilla/show_bug.cgi?id=31484 ).

It might be possible to detect more reliably whether a SIGTRAP is a "deleted
hardware watchpoint/breakpoint trigger" by keeping track of previously set
hardware watchpoint/breakpoint values.  But this is likely fragile and
difficult to maintain.

Which all begs the question whether things could be easier and more reliable
in user space if the kernel would report more info than the trigger address
for a SIGTRAP.

My suspicion is that the current situation of only reporting the trigger
address was designed catering for exact watchpoint addresses.  With exact
watchpoint addresses, classifying the SIGTRAP is trivial.

AFAIU, if the kernel would also communicate:
- whether something is a hardware watchpoint or hardware breakpoint, and
- in the case of a hardware watchpoint, which watchpoint triggered
  (say in the form of the values of the matching debug watchpoint control and
  value registers),
then it would be trivial to do the classification in gdb.

Whether the kernel picks the correct watchpoint or not is an orthogonal issue,
see commit fdfeff0f9e3d which describes the heuristic as "a bit dodgy".