Bug 211801
Summary: | perf: util/unwind-libdw.c:30: __find_debuginfo: Assertion `dso' failed. | ||
---|---|---|---|
Product: | Tracing/Profiling | Reporter: | Dave Rigby (d.rigby) |
Component: | Perf tool | Assignee: | Arnaldo Carvalho de Melo (acme) |
Status: | NEW --- | ||
Severity: | high | CC: | jan, jolsa, jolsa, rjones |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | perf version 5.11.0-1.el7.elrepo.x86_64 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
perf script -v -v --no-inline
Proposed fix v1 |
Description
Dave Rigby
2021-02-16 14:24:46 UTC
Note this is a "semi"-regression - with perf-5.10.16-1.el7.elrepo I don't see the crash, however it seems like perf has failed to locate the DWARF info for the given library, as any backtraces including it are not decoded correctly: # perf --version perf version 5.10.16-1.el7.elrepo.x86_64 perf script --no-inline |grep -B5 -A5 libcouchstore.so <cut> WriterPool3 1970 337490.608958: 20408163 cpu-clock:pppH: 7f40109ee93f compare_records+0xf (/opt/couchbase/lib/libcouchstore.so) <cut> Note no frames before or after `compare_records ` are correctly decoded, _but_ others are fine and `perf script` exits with status zero. Could you please get output from: # perf script -vv it might show what went wrong Created attachment 295333 [details]
perf script -v -v --no-inline
Attached result of:
perf script -v -v --no-inline >/dev/null 2> perf_script_vv.txt
Just ran it in gdb to check I was still seeing the same exception; I believe the problem is that mod->userdata is null (which is the assert which is firing), but it wasn't immediately obvious to me where in the various callbacks that is supposed to be set. #5 0x00007ffff7052e8f in find_debuginfo (mod=mod@entry=0x15c8470) at dwfl_module_getdwarf.c:539 539 mod->debug.fd = (*mod->dwfl->callbacks->find_debuginfo) (MODCB_ARGS (mod), (gdb) p *mod $2 = {dwfl = 0x15cea20, next = 0x0, userdata = 0x0, name = 0xd2dfb0 "libcouchstore.so", low_addr = 139876824051712, high_addr = 139876826574744, main = {name = 0xdb9ae0 "/opt/couchbase/lib/libcouchstore.so", fd = 20, valid = false, relocated = false, elf = 0x15be9c0, vaddr = 0, address_sync = 350598}, debug = {name = 0x0, fd = 0, valid = false, relocated = false, elf = 0x0, vaddr = 0, address_sync = 0}, aux_sym = {name = 0x0, fd = 0, valid = false, relocated = false, elf = 0x0, vaddr = 0, address_sync = 0}, main_bias = 139876824051712, ebl = 0x15dcb20, e_type = 3, elferr = DWFL_E_NOERROR, reloc_info = 0x0, symfile = 0x0, symdata = 0x0, aux_symdata = 0x0, syments = 0, aux_syments = 0, first_global = 0, aux_first_global = 0, symstrdata = 0x0, aux_symstrdata = 0x0, symxndxdata = 0x0, aux_symxndxdata = 0x0, elfdir = 0x0, dw = 0x0, alt = 0x0, alt_fd = 0, alt_elf = 0x0, symerr = DWFL_E_NOERROR, dwerr = DWFL_E_NO_DWARF, first_cu = 0x0, cu = 0x0, lazy_cu_root = 0x0, aranges = 0x0, build_id_bits = 0x0, build_id_vaddr = 0, build_id_len = 0, ncu = 0, lazycu = 0, naranges = 0, dwarf_cfi = 0x0, eh_cfi = 0x15c82d0, segment = 2, gc = false, is_executable = false} Created attachment 295341 [details] Proposed fix v1 I suspect the problem might be due to the fact that userdata is not setup if __report_module fails to find mod via dwfl_addrmodule in the previously-mentioned patch[1]: @@ -46,16 +60,24 @@ static int __report_module(struct addr_location *al, u64 ip, mod = dwfl_addrmodule(ui->dwfl, ip); if (mod) { Dwarf_Addr s; + void **userdatap; - dwfl_module_info(mod, NULL, &s, NULL, NULL, NULL, NULL, NULL); + dwfl_module_info(mod, &userdatap, &s, NULL, NULL, NULL, NULL, NULL); + *userdatap = dso; if (s != al->map->start - al->map->pgoff) mod = 0; } if (!mod) - mod = dwfl_report_elf(ui->dwfl, dso->short_name, - (dso->symsrc_filename ? dso->symsrc_filename : dso->long_name), -1, al->map->start - al->map->pgoff, - false); + mod = dwfl_report_elf(ui->dwfl, dso->short_name, dso->long_name, -1, + al->map->start - al->map->pgoff, false); + if (!mod) { + char filename[PATH_MAX]; + + if (dso__build_id_filename(dso, filename, sizeof(filename), false)) + mod = dwfl_report_elf(ui->dwfl, dso->short_name, filename, -1, + al->map->start - al->map->pgoff, false); + } return mod && dwfl_addrmodule(ui->dwfl, ip) == mod ? 0 : -1; } Note how `*userdatap` is only initialised with dso if the first call to dwfl_addrmodule returns non-null. However, if one of the two other attempts to lookup `mod` succeed, `*userdatap` is never initialised. With the attached patch I no longer see the assert and backtrace decode seems to work as well as with libunwind. Note I'm flying pretty blind here, I'm not at all familiar with the subtleties of ELF / DWARF module manipulation; this is just a dumb "set a field which looks like it should have been set" :) I also don't know enough about the different lookup methods here to understand exactly why dwfl_addrmodule failed for my module... [1]: "perf unwind: Fix separate debug info files when using elfutils' libdw… " - https://github.com/torvalds/linux/commit/bf53fc6b5f415cddc7118091cb8fd6a211b2320d yes, looks like userdatap might not be initialized, I asked Jan to look at it Could you please send the patch for review and make sure Jan is CC-ed? thanks Sure - can you direct me to where I should post the patch (I'm not a regular kernel contributor...). please send it to: linux-perf-users@vger.kernel.org Arnaldo Carvalho de Melo <acme@kernel.org> Jan Kratochvil <jan.kratochvil@redhat.com> Jiri Olsa <jolsa@redhat.com> no worries about the patch/changelog shape (just should be apply-able) let's start the discussion and we will tune the rest I agree with the patch. I wrote exactly the same patch before I found the one of yours here. I just do not have the problem reproducible but if it fixes the problem for you that's all we need. I'll collect the Acked-by: Jan and Jiri, ok? Thanks. This is the first patch I've submitted to Linux so any assistance appreciated :) Found the same issue in Fedora: https://bugzilla.redhat.com/show_bug.cgi?id=1930640 I tested your patch (https://www.spinics.net/lists/linux-perf-users/msg12625.html) and it fixes the problem for me. |