When profiling a simple C program with nested function calls using DWARF call graphs, perf report shows incorrect call graph relationships, including impossible self-calls (merge -> merge) and skipped stack frames (merge_all directly calling spinlock). Reproducer code: #include <stdint.h> #define VOLATILE_ACCESS(x) (*(volatile typeof(x)*)&(x)) __attribute__((noinline)) void spinlock(void) { const uint64_t ITERATIONS = 1000; uint64_t counter = 0; while (counter < ITERATIONS) { VOLATILE_ACCESS(counter); counter += 1; } } __attribute__((noinline)) void merge(void) { for (int i = 0; i < 100; i++) { spinlock(); } } __attribute__((noinline)) void merge_all(void) { for (int i = 0; i < 1000; i++) { merge(); } } int main(void) { for (int i = 0; i < 10; i++) { merge_all(); } return 0; } Build & reproduce: 1. gcc -O3 perf_test.c -o perf_test 2. perf record --call-graph dwarf ./perf_test 3. perf report Actual output: - 99.94% 0.00% main main [.] _start _start __libc_start_main@@GLIBC_2.34 __libc_start_call_main main - merge_all - 98.73% merge 98.23% spinlock 0.50% merge 1.22% spinlock Expected call hierarchy: main ─ merge_all ─ merge ─ spinlock Instead, the call graph shows merge calling itself and merge_all directly calling spinlock, which is impossible given the code structure. System details: perf version: 6.12.9-100.fc40.x86_64 kernel version: 6.9.7-200.fc40.x86_64 gcc version: 14.1.1 20240701 (Red Hat 14.1.1-7)
I'm seeing this same thing. CPU: AMD 5975WX perf: 6.11.4-301.fc41.x86_64 and 6.12.11-200.fc41 kernel: 6.12.11-200.fc41 and 6.11.4-301.fc41 compiler: gcc-14.2.1-7 and clang-19.1.7-2.fc41