Bug 203869 - Abort on libkshark-model.c:242 assertion row != BSEARCH_ALL_GREATER
Summary: Abort on libkshark-model.c:242 assertion row != BSEARCH_ALL_GREATER
Status: RESOLVED CODE_FIX
Alias: None
Product: Tools
Classification: Unclassified
Component: Trace-cmd/Kernelshark (show other bugs)
Hardware: ARM Linux
: P1 normal
Assignee: Default virtual assignee for Trace-cmd and kernelshark
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-06-12 09:21 UTC by Alan Mikhak
Modified: 2019-07-09 08:36 UTC (History)
2 users (show)

See Also:
Kernel Version: 4.19.42
Subsystem:
Regression: No
Bisected commit-id:


Attachments
trace.dat file for this issue (3.20 MB, application/octet-stream)
2019-06-12 09:59 UTC, Alan Mikhak
Details
Combined patch for manual application of Yordan patches (6.16 KB, application/mbox)
2019-06-15 10:21 UTC, Alan Mikhak
Details

Description Alan Mikhak 2019-06-12 09:21:11 UTC
I observed KernelShark aborting on Raspberry Pi 3 model B+ with the following error when I clicked on the second trace record that was displayed:

kernelshark: libkshark-model.c:242: ksmodel_set_upper_edge: Assertion 'row != BSEARCH_ALL_GREATER' failed.
Aborted

For more information, please see minute 0:57 of video I posted on my YouTube channel showing KernelShark abort issue as it happened:
https://www.youtube.com/watch?v=pWRuNQTo9-8

Regards,
Alan Mikhak
Comment 1 Alan Mikhak 2019-06-12 09:33:19 UTC
$ lsb_release -a
No LSB modules are available.
Distributor ID:	Raspbian
Description:	Raspbian GNU/Linux 9.9 (stretch)
Release:	9.9
Codename:	stretch

$ uname -a
Linux rpi7 4.19.42-v7+ #1219 SMP Tue May 14 21:20:58 BST 2019 armv7l GNU/Linux
Comment 2 Yordan Karadzhov 2019-06-12 09:36:41 UTC
Hi Alan,

Please attach the trace.dat file which makes KernelShark crashing.

Thanks!
Yordan
Comment 3 Alan Mikhak 2019-06-12 09:59:06 UTC
Created attachment 283219 [details]
trace.dat file for this issue

Hi Yordan,

Please see the attached trace.dat file which KernelShark was processing when it raised this assertion.

Regards,
Alan
Comment 4 Yordan Karadzhov 2019-06-12 11:09:41 UTC
Hi Alan,

From what I see in the video, my guess will be that something is going wrong in
static void ksmodel_set_in_range_bining()

https://git.kernel.org/pub/scm/utils/trace-cmd/trace-cmd.git/tree/kernel-shark/src/libkshark-model.c#n96

in this function we once again mess up size_t and uint64_t. Maybe "histo->min" and "histo->max" are getting some nonsensical values which will later lead to having the binary search failing. Unfortunately, it will be very hard for me to get a setup on which I can reproduce this problem, so I was thinking that maybe you can do some debugging using your system.

We will greatly appreciate your help!
Cheers,
Yordan
Comment 5 Yordan Karadzhov 2019-06-12 14:27:58 UTC
Hi Alan,

Please try applying the two patches here
https://patchwork.kernel.org/project/linux-trace-devel/list/?series=131053

and check if this has any effect on the problem that you see on your Raspberry Pi.

Thanks!
Yordan
Comment 6 Alan Mikhak 2019-06-13 04:41:59 UTC
Hi Yordan,

I built trace-cmd and KernelShark from a fresh clone of the official git repo with both patches applied. I then recorded a fresh trace.cmd file with the same command as before.

$ sudo trace-cmd record -e sched ls -ltr /usr > /dev/null

KernelShark still aborted on my Raspberry Pi 3 model B+ when I clicked on the second entry. However, it produced the following new error message before asserting:

qt.qpa.xcb: QXcbConnection: XCB error: 148 (Unknown), sequence: 182, resource id: 0, major code: 140 (Unknown), minor code: 20

kernelshark: /home/pi/src/kernel.org/trace-cmd/kernel-shark/src/libkshark-model.c:242: ksmodel_set_upper_edge: Assertion `row != BSEARCH_ALL_GREATER' failed.
Aborted

Regards,
Alan
Comment 7 Alan Mikhak 2019-06-14 04:43:16 UTC
Hi Yordan,

I installed trace-cmd and kernelshark on my Raspberry Pi 3 model B+
using the following command:

$ sudo apt install trace-cmd kernelshark

The version of kernelshark installed by 'apt install' runs smoothly on
my Raspberry Pi 3 model B+ and doesn't exhibit the abort issue I
reported here about the version I built from sources.

Regards,
Alan
Comment 8 Yordan Karadzhov 2019-06-14 12:30:10 UTC
The Ubuntu package ships the 10 years old GTK version of KernelShark.
The new version (the one that crashes on your Raspberry Pi 3) was developed from scratch recently and is based on Qt.

BTW I believe I found the problem responsible for the crash. It is here:

https://git.kernel.org/pub/scm/utils/trace-cmd/trace-cmd.git/tree/kernel-shark/src/libkshark-model.c#n587



void ksmodel_jump_to(struct kshark_trace_histo *histo, size_t ts)
{
	size_t min, max, range_min;
...

this must be changed to:

void ksmodel_jump_to(struct kshark_trace_histo *histo, uint64_t ts)
{
	uint64_t min, max, range_min;

Please try this fix, keeping the previous two patches, and tell me if it works this time.

Thanks a lot!
Yordan
Comment 9 Alan Mikhak 2019-06-15 10:20:22 UTC
Hi Yordan,

I had to manually apply your two patches from 2019-06-14 manually on top of your first patch from 2019-06-12. I was able to build KernelShark on my Raspberry Pi 3 model B+ and observe that your combined changes resolved the abort issue as well as compiler warnings. I also observed the same good results on my ODROID-XU3 from hardkernel.com which also runs a 32-bit armv7l kernel as well as my 96Boards ROCK960 model C which runs a 64-bit aarch64 kernel.

Please see the attached patch file which shows the combined changes I manually applied as your patches intended.

Regards,
Alan
Comment 10 Alan Mikhak 2019-06-15 10:21:39 UTC
Created attachment 283275 [details]
Combined patch for manual application of Yordan patches

Combined patch for manual application of patches from Yordan.
Comment 11 Yordan Karadzhov 2019-06-17 10:41:11 UTC
Hi Alan,
Thank you very much for helping us making KernelShark better.

I hope you will stay active, reporting bugs, or why not even sending patches to the mailing list.

cheers,
Yordan
Comment 12 Alan Mikhak 2019-06-18 01:51:25 UTC
Hi Yordan,

Glad I could do something useful. I hope to stay active.

Regards,
Alan
Comment 13 Alan Mikhak 2019-07-06 20:55:58 UTC
Hi Yordan,

I observed this abort issue again when I cloned a fresh copy of trace-cmd git repo today to compile on a brand new Raspberry Pi 4 model B.

git://git.kernel.org/pub/scm/utils/trace-cmd/trace-cmd.git

I also observed the same compiler warnings that you had fixed in your patches.

Reopening the ticket to track until your patches go live.

Regards,
Alan
Comment 15 Yordan Karadzhov 2019-07-08 15:35:22 UTC
Hi Alan,

It was my fault, to close the issue before the fix was actually pushed upstream.

Is is upstream now, so please try again and tell us if works or not.
You can look here

https://git.kernel.org/pub/scm/utils/trace-cmd/trace-cmd.git/log/

to see all patches that made it to the master branch of the repo.

Thanks a lot!
Y.
Comment 16 Alan Mikhak 2019-07-09 06:45:48 UTC
Hi Yordan,

I verified that KernelShark built successfully from latest git repo without compiler warnings and didn't exhibit this abort issue on the following platforms:

Raspberry Pi 3 Model B+:

$ uname -a
Linux rpi6 4.19.57-v7+ #1244 SMP Thu Jul 4 18:45:25 BST 2019 armv7l GNU/Linux

$ lsb_release -a
No LSB modules are available.
Distributor ID:	Raspbian
Description:	Raspbian GNU/Linux 9.9 (stretch)
Release:	9.9
Codename:	stretch


Raspberry Pi 4 Model B:

$ uname -a
Linux rpi8 4.19.57-v7l+ #1244 SMP Thu Jul 4 18:48:07 BST 2019 armv7l GNU/Linux

$ lsb_release -a
No LSB modules are available.
Distributor ID:	Raspbian
Description:	Raspbian GNU/Linux 10 (buster)
Release:	10
Codename:	buster


Nano Pi M4:

$ uname -a
Linux nanopi1 4.4.167 #1 SMP Sun Jun 2 18:38:17 CST 2019 aarch64 aarch64 aarch64 GNU/Linux

$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 18.04.2 LTS
Release:	18.04
Codename:	bionic


All of the above platforms required building Qt 5 from sources to resolve the per-CPU visualization issue as documented in a separate ticket.

I observed the following trace-cmd compiler warning on the Raspberry Pi platforms. I was able to resolve the warning by installing the 'libaudit-dev' package as opposed to the 'libaudit-devel' package suggested by the warning:

$ make
:::
  COMPILE                trace-snapshot.o
  COMPILE                trace-stat.o
  COMPILE                trace-profile.o
trace-profile.c:23:3: warning: #warning "lib audit not found, using raw syscalls " "(install libaudit-devel and try again)" [-Wcpp]
 # warning "lib audit not found, using raw syscalls " \
   ^~~~~~~
  COMPILE                trace-stream.o
  COMPILE                trace-restore.o
  COMPILE                trace-check-events.o
:::

$ sudo apt install libaudit-devel
Reading package lists... Done
Building dependency tree       
Reading state information... Done
E: Unable to locate package libaudit-devel

$ sudo apt install libaudit-dev

Regards,
Alan
Comment 17 Alan Mikhak 2019-07-09 08:36:15 UTC
RESOLVED as CODE_FIX

Note You need to log in before you can comment on or make changes to this bug.