Bug 219543

Summary: libtracefs: testsuite fail on s390x (segfault)
Product: Tools Reporter: Pragyansh Chaturvedi (pragyanshchaturvedi18)
Component: Trace-cmd/KernelsharkAssignee: Default virtual assignee for Trace-cmd and kernelshark (tools_tracecmd_kernelshark)
Status: NEW ---    
Severity: normal CC: pragyanshchaturvedi18, rostedt
Priority: P3    
Hardware: S390-64   
OS: Linux   
Kernel Version: Subsystem:
Regression: No Bisected commit-id:
Attachments: endianness-fix.patch
attachment-4622-0.html

Description Pragyansh Chaturvedi 2024-11-29 13:54:32 UTC
Hi
When working on the libtracefs 1.8.0/1.8.1 package in Ubuntu, we
observed testsuite triggering segfaults on s390x.
Example run: https://autopkgtest.ubuntu.com/results/autopkgtest-noble/noble/s390x/libt/libtracefs/20240417_184123_8ab96@/log.gz 

Upon investigating, it turns out to be an endianness issue in libtraceevent.
In libtraceevent/src/event_parse.c, we have:

/**
 * tep_alloc - create a tep handle
 */
struct tep_handle *tep_alloc(void)
{
 struct tep_handle *tep = calloc(1, sizeof(*tep));

 if (tep) {
  tep->ref_count = 1;
  tep->host_bigendian = tep_is_bigendian();
 }

 return tep;
}

So on s390x, tep->host_bigendian is TEP_BIG_ENDIAN, but tep->file_bigendian stays the default value (TEP_LITTLE_ENDIAN)
Then in libtracefs/src/kbuffer_parse.c, we have:

enum {
 KBUFFER_FL_HOST_BIG_ENDIAN = (1<<0),
 KBUFFER_FL_BIG_ENDIAN = (1<<1),
 KBUFFER_FL_LONG_8 = (1<<2),
 KBUFFER_FL_OLD_FORMAT = (1<<3),
};

#define ENDIAN_MASK (KBUFFER_FL_HOST_BIG_ENDIAN | KBUFFER_FL_BIG_ENDIAN)

...

static int do_swap(struct kbuffer *kbuf)
{
 return ((kbuf->flags & KBUFFER_FL_HOST_BIG_ENDIAN) + kbuf->flags) &
  ENDIAN_MASK;
}

kbuf->flags is populated based off the tep_handle object. So the tests fail because libtraceevent thinks the files it opens are stored in little endian format, while actually it is the other way round.
We can make a default assumption that the host and FS endianness is same. If it is different, the user must set the correct endianness using the event-parse-api (tep_set_file_bigendian)
After applying my fix, the run summary of utest/trace-utest of libtracefs:

Run Summary: Type Total Ran Passed Failed Inactive
              suites 1 1 n/a 0 0
               tests 36 36 35 1 0
             asserts 16407066 16407066 16407064 2 n/a

Elapsed time = 22.623 seconds

Diff to apply:

--- a/src/event-parse.c	2024-11-20 01:08:35.806823782 +0530
+++ b/src/event-parse.c	2024-11-22 13:13:50.775966911 +0530
@@ -8501,6 +8501,12 @@
 	if (tep) {
 		tep->ref_count = 1;
 		tep->host_bigendian = tep_is_bigendian();
+
+		// We can make the following safe assumption
+		// for the default case. Else it leaves the
+		// file endianness as little endian and breaks
+		// things on big endian architectures.
+		tep->file_bigendian = tep->host_bigendian;
 	}
 
 	return tep;
Comment 1 Pragyansh Chaturvedi 2024-11-29 14:05:59 UTC
Created attachment 307303 [details]
endianness-fix.patch

Patch to fix endianness issues on s390x
Comment 2 Pragyansh Chaturvedi 2024-12-05 08:09:54 UTC
Hi, bumping this for visibility
Comment 3 Tzvetomir Stoyanov 2024-12-05 08:10:05 UTC
Created attachment 307324 [details]
attachment-4622-0.html

Please note: The recipient's email address has changed to tzvetomir.stoyanov@broadcom.com.
Your original message has been forwarded to this recipient's new Broadcom email account. Please do not use the recipient's @vmware.com email address going forward.
Comment 4 Steven Rostedt 2024-12-05 14:53:56 UTC
Thanks, can you send a formal patch to linux-trace-devel@vger.kernel.org?

Please follow the Linux kernel patch submission (proper change log and signed-off-by). And the comment should use the format:

 /*
  * For multi line
  * comments.
  */

Thanks!

-- Steve