Bug 216369 - trace_read_pipe peek a PADDING event and trying to print it
Summary: trace_read_pipe peek a PADDING event and trying to print it
Status: NEW
Alias: None
Product: Tracing/Profiling
Classification: Unclassified
Component: Ftrace (show other bugs)
Hardware: ARM Linux
: P1 normal
Assignee: Steven Rostedt
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-08-16 06:39 UTC by Jiazi Li
Modified: 2022-08-29 12:02 UTC (History)
0 users

See Also:
Kernel Version: 4.19、5.10
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Jiazi Li 2022-08-16 06:39:26 UTC
hi,

I encounter a crash issue, trace_read_pipe peek a PADDING event from ringbuffer, and trying to print it, resulting in kernel crash.

1. in ring_buffer_event_data, found this event's type_len is bigger than RINGBUF_TYPE_DATA_TYPE_LEN_MAX, print WARNING in dmesg.

[249033.677658] -(1)[9975:mobile_log_d.rd]WARNING: CPU: 1 PID: 9975 at ring_buffer_event_data+0x38/0x48
[249033.687108] -(1)[9975:mobile_log_d.rd]CPU: 1 PID: 9975 Comm: mobile_log_d.rd Tainted: P S      W  O      4.19.191-ge8b7b3424080-dirty #1
[249033.688703] -(1)[9975:mobile_log_d.rd]Hardware name: MT6769V/CZ (DT)
[249033.689566] -(1)[9975:mobile_log_d.rd]pstate: 60400005 (nZCv daif +PAN -UAO)
[249033.690519] -(1)[9975:mobile_log_d.rd]pc : ring_buffer_event_data+0x38/0x48
[249033.691460] -(1)[9975:mobile_log_d.rd]lr : __find_next_entry+0x120/0x230
[249033.692360] -(1)[9975:mobile_log_d.rd]sp : ffffff800b05bc30
[249033.693117] -(1)[9975:mobile_log_d.rd]x29: ffffff800b05bc70 x28: 0000000000000000 
[249033.694132] -(1)[9975:mobile_log_d.rd]x27: ffffffd10b553fe4 x26: 0000000000000006 
[249033.695146] -(1)[9975:mobile_log_d.rd]x25: ffffff927285dcc8 x24: ffffffd192e1c800 
[249033.696160] -(1)[9975:mobile_log_d.rd]x23: 0000000000000000 x22: ffffff927285d000 
[249033.697174] -(1)[9975:mobile_log_d.rd]x21: ffffffd1941540e8 x20: ffffffd194152000 
[249033.698188] -(1)[9975:mobile_log_d.rd]x19: ffffffd1941540e0 x18: 0000000000000000 
[249033.699201] -(1)[9975:mobile_log_d.rd]x17: 0000000000000000 x16: ffffff92722071dc 
[249033.700216] -(1)[9975:mobile_log_d.rd]x15: ffffff92723ad6ef x14: 000000000000003f 
[249033.701230] -(1)[9975:mobile_log_d.rd]x13: ffff0000ffffff00 x12: 000000000000000b 
[249033.702244] -(1)[9975:mobile_log_d.rd]x11: 0000000000000014 x10: 000000000000001e 
[249033.703258] -(1)[9975:mobile_log_d.rd]x9 : 000000000000001d x8 : ffffffd10b553fe4 
[249033.704272] -(1)[9975:mobile_log_d.rd]x7 : 0000000000000001 x6 : ffffffd194153176 
[249033.705285] -(1)[9975:mobile_log_d.rd]x5 : ffffffd10b553fc8 x4 : ffff0a01ffffff00 
[249033.706300] -(1)[9975:mobile_log_d.rd]x3 : ffffff800b05bc60 x2 : 0000000000000001 
[249033.707314] -(1)[9975:mobile_log_d.rd]x1 : 0000000000000000 x0 : ffffffd10b553fe4 
[249033.708331] -(1)[9975:mobile_log_d.rd]Call trace:
[249033.708989] -(1)[9975:mobile_log_d.rd] ring_buffer_event_data+0x38/0x48
[249033.709883] -(1)[9975:mobile_log_d.rd] tracing_read_pipe+0x22c/0x3ec
[249033.710745] -(1)[9975:mobile_log_d.rd] __vfs_read+0x44/0x13c
[249033.711516] -(1)[9975:mobile_log_d.rd] vfs_read+0xb8/0x150
[249033.712266] -(1)[9975:mobile_log_d.rd] ksys_read+0x6c/0xcc
[249033.713015] -(1)[9975:mobile_log_d.rd] __arm64_sys_read+0x18/0x20
[249033.713847] -(1)[9975:mobile_log_d.rd] el0_svc_common+0x98/0x160
[249033.714663] -(1)[9975:mobile_log_d.rd] el0_svc_handler+0x60/0x78
[249033.715482] -(1)[9975:mobile_log_d.rd] el0_svc+0x8/0x300

2. trace_read_pipe print this event resulting in a crash:

[249033.718312] .(1)[9975:mobile_log_d.rd]Unable to handle kernel paging request at virtual address 0031154100000006
[249033.719952] .(1)[9975:mobile_log_d.rd]Mem abort info:
[249033.720706] .(1)[9975:mobile_log_d.rd]  ESR = 0x96000004
[249033.721436] .(1)[9975:mobile_log_d.rd]  Exception class = DABT (current EL), IL = 32 bits
[249033.722586] .(1)[9975:mobile_log_d.rd]  SET = 0, FnV = 0
[249033.723619] .(1)[9975:mobile_log_d.rd]  EA = 0, S1PTW = 0
[249033.724413] .(1)[9975:mobile_log_d.rd]Data abort info:
[249033.725119] .(1)[9975:mobile_log_d.rd]  ISV = 0, ISS = 0x00000004
[249033.725944] .(1)[9975:mobile_log_d.rd]  CM = 0, WnR = 0
[249033.726685] .(1)[9975:mobile_log_d.rd][0031154100000006] address between user and kernel address ranges
[249033.728207] -(1)[9975:mobile_log_d.rd]Internal error: Oops: 96000004 [#1] PREEMPT SMP
[249033.762538] -(1)[9975:mobile_log_d.rd]Kernel Offset: 0x1269200000 from 0xffffff8008000000
[249033.763633] -(1)[9975:mobile_log_d.rd]PHYS_OFFSET: 0xffffffef40000000 
[249033.777647] -(1)[9975:mobile_log_d.rd]CPU: 1 PID: 9975 Comm: mobile_log_d.rd Tainted: P S      W  O      4.19.191-ge8b7b3424080-dirty #1
[249033.779236] -(1)[9975:mobile_log_d.rd]Hardware name: MT6769V/CZ (DT)
[249033.780090] -(1)[9975:mobile_log_d.rd]pstate: 20400005 (nzCv daif +PAN -UAO)
[249033.781039] -(1)[9975:mobile_log_d.rd]pc : string+0x38/0x74
[249033.781794] -(1)[9975:mobile_log_d.rd]lr : vsnprintf+0x430/0x65c
[249033.782601] -(1)[9975:mobile_log_d.rd]sp : ffffff800b05bad0
[249033.783351] -(1)[9975:mobile_log_d.rd]x29: ffffff800b05bad0 x28: ffffff800b05bb80 
[249033.784357] -(1)[9975:mobile_log_d.rd]x27: ffffff92723ad4c8 x26: ffffff92724ec0f6 
[249033.785361] -(1)[9975:mobile_log_d.rd]x25: ffffff92724ec0f8 x24: ffffffd1941531b3 
[249033.786367] -(1)[9975:mobile_log_d.rd]x23: 00000000ffffffff x22: 0000000000000002 
[249033.787372] -(1)[9975:mobile_log_d.rd]x21: ffffffd1941540a0 x20: ffffff800b05bb68 
[249033.788377] -(1)[9975:mobile_log_d.rd]x19: ffffffd1941531b2 x18: 0000000000000000 
[249033.789382] -(1)[9975:mobile_log_d.rd]x17: 0000000000000031 x16: ffffff92722072a0 
[249033.790387] -(1)[9975:mobile_log_d.rd]x15: ffffff92723ad6ef x14: 000000000000004e 
[249033.791392] -(1)[9975:mobile_log_d.rd]x13: ffff0000ffffff00 x12: 0000000000000000 
[249033.792397] -(1)[9975:mobile_log_d.rd]x11: ffffff92723ad6b8 x10: 1531154100000006 
[249033.793401] -(1)[9975:mobile_log_d.rd]x9 : 0000000000000000 x8 : ffffffd1941540a0 
[249033.794406] -(1)[9975:mobile_log_d.rd]x7 : 2020202020202020 x6 : ffffffd1941531b3 
[249033.795411] -(1)[9975:mobile_log_d.rd]x5 : ffffffd10b553fc8 x4 : ffff0a00ffffff04 
[249033.796415] -(1)[9975:mobile_log_d.rd]x3 : ffff0a00ffffff04 x2 : 1531154100000006 
[249033.797420] -(1)[9975:mobile_log_d.rd]x1 : ffffffffffffffff x0 : ffffffd1941531b3 
[249033.798430] -(1)[9975:mobile_log_d.rd]CPU: 1 PID: 9975 Comm: mobile_log_d.rd Tainted: P S      W  O      4.19.191-ge8b7b3424080-dirty #1
[249033.800018] -(1)[9975:mobile_log_d.rd]Hardware name: MT6769V/CZ (DT)
[249033.800868] -(1)[9975:mobile_log_d.rd]Call trace:
[249033.801521] -(1)[9975:mobile_log_d.rd] dump_backtrace+0x0/0x158
[249033.802320] -(1)[9975:mobile_log_d.rd] show_stack+0x14/0x1c
[249033.803076] -(1)[9975:mobile_log_d.rd] dump_stack+0xb8/0xf0
[249033.803836] -(1)[9975:mobile_log_d.rd] mrdump_common_die+0x170/0x214
[249033.804689] -(1)[9975:mobile_log_d.rd] ipanic_die+0x30/0x40
[249033.805445] -(1)[9975:mobile_log_d.rd] notify_die+0x64/0xb4
[249033.806200] -(1)[9975:mobile_log_d.rd] die+0xac/0x37c
[249033.806891] -(1)[9975:mobile_log_d.rd] die_kernel_fault+0x60/0x70
[249033.807710] -(1)[9975:mobile_log_d.rd] die_kernel_fault+0x0/0x70
[249033.808518] -(1)[9975:mobile_log_d.rd] do_bad_area+0x30/0x130
[249033.809296] -(1)[9975:mobile_log_d.rd] do_translation_fault+0x34/0x40
[249033.810158] -(1)[9975:mobile_log_d.rd] do_mem_abort+0x4c/0xf8
[249033.810935] -(1)[9975:mobile_log_d.rd] el1_da+0x1c/0xac
[249033.811647] -(1)[9975:mobile_log_d.rd] string+0x38/0x74
[249033.812357] -(1)[9975:mobile_log_d.rd] vsnprintf+0x430/0x65c
[249033.813122] -(1)[9975:mobile_log_d.rd] seq_buf_vprintf+0x64/0xcc
[249033.813934] -(1)[9975:mobile_log_d.rd] trace_seq_printf+0x70/0xd8
[249033.814755] -(1)[9975:mobile_log_d.rd] trace_raw_output_ipi_handler+0x40/0x54
[249033.815705] -(1)[9975:mobile_log_d.rd] print_trace_line+0x3ec/0x424
[249033.816547] -(1)[9975:mobile_log_d.rd] tracing_read_pipe+0x28c/0x3ec
[249033.817402] -(1)[9975:mobile_log_d.rd] __vfs_read+0x44/0x13c
[249033.818166] -(1)[9975:mobile_log_d.rd] vfs_read+0xb8/0x150
[249033.818908] -(1)[9975:mobile_log_d.rd] ksys_read+0x6c/0xcc
[249033.819651] -(1)[9975:mobile_log_d.rd] __arm64_sys_read+0x18/0x20
[249033.820470] -(1)[9975:mobile_log_d.rd] el0_svc_common+0x98/0x160
[249033.821278] -(1)[9975:mobile_log_d.rd] el0_svc_handler+0x60/0x78
[249033.822086] -(1)[9975:mobile_log_d.rd] el0_svc+0x8/0x300

I do not enable ipi event, array[0] of this padding event is 0x18, in my device,
ipi entry trace event's type value is 0x18 too, so trace_read_pipe call trace_raw_output_ipi_handler to print this event,eventually, the system crashed because of illegal pointers.

PADDING event should not return to caller in ring_buffer_peek.
Comment 1 Jiazi Li 2022-08-29 12:02:22 UTC
hi,

This issue has happened many times. Every time peek_next_entry return a padding event at end of page.

Our reader task poll trace_pipe and other files.
Maybe this issue will triggered in the following scenarios:
1. reader task wakeup by other file.
2. after read data from other file, call ring_buffer_poll_wait and find ring_buffer no empty because read != commit,
but because of cpu out-of-order execution, next event has not been set to padding type.
3. in ring_buffer_peek, expired data in this event show it is not a padding event, and return event to peek_next_entry.
4. event is set to padding type and detected in rb_event_data, then print a warning.
5. print_trace_line use error type to print this event, eventually crash.

Maybe we can add smp_wb to make ture padding type will be set first:
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -2681,6 +2681,7 @@ rb_move_tail(struct ring_buffer_per_cpu *cpu_buffer,

        rb_reset_tail(cpu_buffer, tail, info);

+       smp_wb();
        /* Commit what we have for now. */
        rb_end_commit(cpu_buffer);
        /* rb_end_commit() decs committing */

Note You need to log in before you can comment on or make changes to this bug.