Bug 207873

Summary: BUG at swapops + rcu stall + soft lockup at running btrfs test suite (TEST=013\* ./misc-tests.sh)
Product: Platform Specific/Hardware Reporter: Erhard F. (erhard_f)
Component: PPC-32Assignee: platform_ppc-32
Status: RESOLVED PATCH_ALREADY_AVAILABLE    
Severity: normal CC: christophe.leroy, fs_btrfs
Priority: P1    
Hardware: PPC-32   
OS: Linux   
See Also: https://bugzilla.kernel.org/show_bug.cgi?id=207221
Kernel Version: 5.7-rc6 Subsystem:
Regression: No Bisected commit-id:
Attachments: kernel .config (5.7-rc6, PowerMac G4 DP)
kernel dmesg (5.7-rc6, PowerMac G4 DP)
screenshot 01
screenshot 02
transcript of both screenshots

Description Erhard F. 2020-05-23 22:21:55 UTC
Created attachment 289253 [details]
kernel .config (5.7-rc6, PowerMac G4 DP)

The bug is triggered by running "TEST=013\* ./misc-tests.sh" of btrfs-progs test suite, built from git master:

 # git clone https://github.com/kdave/btrfs-progs && cd btrfs-progs/
 # ./autogen.sh && ./configure --disable-documentation
 # make && make fssum
 # cd tests/
 # TEST=013\* ./misc-tests.sh

The G4 crashes and the reboot timer kicks in. Before it shows a series of stack traces, starting with the "kernel BUG at include/linux/swapops.h:197!"-part from bug #207221. After that I get an rcu stall and a soft lockup. For the full stacktrace have a look at the transcript of both screenshots.

[...]
rcu: INFO: rcu_sched self-detected stall on CPU
rcu: o1-....: (7799 ticks this GP) idle=a06/1/0x40000002 soft irq=11075/11075 fqs=2599
o(t=7804 jiffies g=21629 q=59)
Task dump for CPU 1:
dd              R  running task        0  2200    394 0x0000000c
Call Trace:
[f49fb458] [c00fcddc] sched_show_task+0x3bc/Ox3fe (unreliable)
[f49fb498] [c01c650c] rcu_dump_cpu_stacks+0x228/0x23c
[f49fb4e8] [c01c2e18] rcu_sched_clock_irq+0x81c/0x1360
[f49fb568] [c01d8940] update_process_times+0x2c/0x98
[f49fb588] [c02027d4] tick_sched_timer+0x128/0x1d8
[f49fb5a8] [c01dc49c] __hrtimer_run_queues+0x490/Oxae8
[f49fb698] [c01dd788] hrtimer_interrupt+0x278/0x520
[f49fb6f8] [c001710c] timer_interrupt+0x374/0xb4c
[f49fb738] [c002c5e4] ret_from_except+0x0/0x14
--- interrupt: 901 at do_raw_spin_lock+0x1c8/0x2cc
    LR = do_raw_spin_lock+0x1a4/0x2cc
[f49fb800] [c0180e0c] do_raw_spin_lock+0x188/0x2cc (unrelable)
[f49fb830] [c0428890] unmap_page_range+0x244/0xb08
[f49fb910] [c0429610] unmap_vmas+0x94/0xdc
[f49fb930] [c043c25c] exit_mmap+0x340/0x46c
[f49fba20] [c0078260] __mmput+0x78/0x360
[f49fba50] [c0090514] do_exit+0x9c4/0x21fc
[f49fbb20] [c0019d38] user_single_step_report+0x0/0x74
[f49fbb70] [c002c5e0] ret_from_except+0x0/0x4
--- interrupt: 700 at __migration_entry_wait+0x13c/0x198
    LR = __migration_entry_wait+0xf0/0x198
[f49fbc58] [c042c0f0] do_swap_page+0x1f0/0x198
[f49fbd28] [c042e7e4] handle_mm_fault+0x794/0x16f4
[f49fbe48] [c0039868] do_page_fault+0xf50/0x12f8
[f49fbf38] [c002c468] handle_page_fault+0x10/0x3c
--- interrupt: 301 at 0x87e378
    LR = 0x87e33c
[...]

I don't know wether this is a btrfs bug, or a bug only triggered by this specific test. So I am filing this as platform specific as I have not seen it on x86 yet.

Unlike bug #207221 KASAN is enabled here, so the stack trace looks slightly different.
Comment 1 Erhard F. 2020-05-23 22:22:27 UTC
Created attachment 289255 [details]
kernel dmesg (5.7-rc6, PowerMac G4 DP)
Comment 2 Erhard F. 2020-05-23 22:23:14 UTC
Created attachment 289257 [details]
screenshot 01
Comment 3 Erhard F. 2020-05-23 22:23:40 UTC
Created attachment 289259 [details]
screenshot 02
Comment 4 Erhard F. 2020-05-23 22:24:35 UTC
Created attachment 289261 [details]
transcript of both screenshots
Comment 6 Erhard F. 2020-05-24 17:04:46 UTC
(In reply to Christophe Leroy from comment #5)
> Try
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/
> ?id=40bb0e904212cf7d6f041a98c58c8341b2016670
Thanks for the hint! That patch did the trick. The btrfs test suite completes fine now and building larger projects works unremarkably. 

Will close here as the fix seems to be going into -rc7.