[ First filed at https://sourceware.org/bugzilla/show_bug.cgi?id=31486 ] Consider the following scenario on aarch64: - we set a hardware watchpoint of size 8 (the maximum size) on some memory location p+8. - we allow execution to happen, and an stp instruction is executed, writing and 8-byte register r1 to p and another 8-byte register r2 to p+8. - the hardware watchpoint triggers. The trigger address reported by the watchpoint can be p, or p+8. This depends on the cpu. I was able to observe both behaviours, the p case on an Apple M1, and the p+8 case on both a Rockchip RK3399 and a MediaTek MT8183. In the p+8 case, it's easy to match the trigger address to the watchpoint. In the p case, less so. This case is known as an having an "inexact watchpoint address", and handling was added in the linux kernel by commit fdfeff0f9e3d ("arm64: hw_breakpoint: Handle inexact watchpoint addresses"). The responsibility of the watchpoint_handler in arch/arm64/kernel/hw_breakpoint.c is to decide whether to report a SIGTRAP to the user, and with what trigger address. The trigger address used is simply what the hardware reported, so in the scenario above that's either p or p+8. Now let's see how that plays out in user space, using the executable from the gdb testcase gdb.base/watchpoint-unaligned.exp. First, with lldb (version 17.0.6): ... $ lldb watchpoint-unaligned \ -o "b main" \ -o run \ -o "watch set var data.u.size8twice[1]" \ -o c \ -o q (lldb) target create "watchpoint-unaligned" Current executable set to '/home/vries/gdb/watchpoint-unaligned' (aarch64). (lldb) b main Breakpoint 1: where = watchpoint-unaligned`main + 8 at watchpoint-unaligned.c:65:3, address = 0x0000000000410208 (lldb) run Process 1077373 stopped * thread #1, name = 'watchpoint-unal', stop reason = breakpoint 1.1 frame #0: 0x0000000000410208 watchpoint-unaligned`main at watchpoint-unaligned.c:65:3 62 63 assert (sizeof (data) == 8 + 3 * 8); 64 -> 65 write_size8twice (); 66 67 while (size) 68 { Process 1077373 launched: '/home/vries/gdb/watchpoint-unaligned' (aarch64) (lldb) watch set var data.u.size8twice[1] Watchpoint created: Watchpoint 1: addr = 0x00440048 size = 8 state = enabled type = w declare @ '/home/vries/gdb/src/gdb/testsuite/gdb.base/watchpoint-unaligned.c:35:3' watchpoint spec = 'data.u.size8twice[1]' new value: 0 (lldb) c Process 1077373 resuming Process 1077373 stopped * thread #1, name = 'watchpoint-unal', stop reason = trace frame #0: 0x00000000004101f0 watchpoint-unaligned`write_size8twice at watchpoint-unaligned.c:48:3 45 46 #ifdef __aarch64__ 47 volatile void *p = &data.u.size8twice[offset]; -> 48 asm volatile ("stp %1, %2, [%0]" 49 : /* output */ 50 : "r" (p), "r" (first), "r" (second) /* input */ 51 : "memory" /* clobber */); (lldb) q ... So, rather than reporting a watchpoint stop (showing old/new values), lldb reports a "trace" stop. Current gdb does better, since commit 9a03f218534 ("[gdb/tdep] Fix gdb.base/watchpoint-unaligned.exp on aarch64"): ... $ gdb -q -batch -iex "set trace-commands on" watchpoint-unaligned \ -ex start \ -ex "watch data.u.size8twice[1]" \ -ex c +start Temporary breakpoint 1 at 0x410208: file watchpoint-unaligned.c, line 65. [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Temporary breakpoint 1, main () at watchpoint-unaligned.c:65 65 write_size8twice (); +watch data.u.size8twice[1] Hardware watchpoint 2: data.u.size8twice[1] +c Hardware watchpoint 2: data.u.size8twice[1] Old value = 0 New value = 2 write_size8twice () at watchpoint-unaligned.c:56 56 } ... This was using a regular watchpoint (observes value changed), but say we change this to an access watchpoint (observes reads or writes). With lldb, the behaviour is similar, but with gdb we run into a hang ( see https://sourceware.org/bugzilla/show_bug.cgi?id=31486 ). To understand why this hang happens, consider what happens when gdb receives a SIGTRAP. It classifies it as one of: - hardware watchpoint trigger, - hardware breakpoint trigger, or - deleted hardware watchpoint/breakpoint trigger. In case of a hardware watchpoint trigger, it may be necessary to briefly disable the watchpoint to be able to continue past the instruction triggering the watchpoint. The hang happens because the hardware watchpoint trigger is classified as "deleted hardware watchpoint/breakpoint trigger". It is ignored, but when trying to continue past the instruction, we keep running into the same watchpoint over and over again. This can be trivially fixed by the same type of fix as in 9a03f218534: loosening the matching criteria to allow matches 8 bytes before the watchpoint address. This however will result in false positives. In the case of the regular watchpoints, we filter those out on the basis that the watched values didn't change. But that's not possible for access watchpoints. Note also that the fix in commit 9a03f218534 itself is not ideal. It contains a hardcoded value max_access_size, set to 16, which might need updating to handle cases with SVE instructions, or mops instructions ( see also https://sourceware.org/bugzilla/show_bug.cgi?id=31484 ). It might be possible to detect more reliably whether a SIGTRAP is a "deleted hardware watchpoint/breakpoint trigger" by keeping track of previously set hardware watchpoint/breakpoint values. But this is likely fragile and difficult to maintain. Which all begs the question whether things could be easier and more reliable in user space if the kernel would report more info than the trigger address for a SIGTRAP. My suspicion is that the current situation of only reporting the trigger address was designed catering for exact watchpoint addresses. With exact watchpoint addresses, classifying the SIGTRAP is trivial. AFAIU, if the kernel would also communicate: - whether something is a hardware watchpoint or hardware breakpoint, and - in the case of a hardware watchpoint, which watchpoint triggered (say in the form of the values of the matching debug watchpoint control and value registers), then it would be trivial to do the classification in gdb. Whether the kernel picks the correct watchpoint or not is an orthogonal issue, see commit fdfeff0f9e3d which describes the heuristic as "a bit dodgy".