Compiling code under /afs filesystem using make causes hang For me, this is regression and the latest working version was 6.7.2. I have not bisected this but first not working version was 6.7.3. 69815.691207] rcu: INFO: rcu_sched self-detected stall on CPU [69815.691212] rcu: 1-....: (17999 ticks this GP) idle=e61c/1/0x4000000000000000 softirq=2073613/2073613 fqs=5872 [69815.691216] rcu: (t=18000 jiffies g=3239965 q=24582 ncpus=4) [69815.691218] CPU: 1 PID: 132851 Comm: make Not tainted 6.7.4 #1 [69815.691220] Hardware name: LENOVO 20HF0047MX/20HF0047MX, BIOS N1WET68W (1.47 ) 07/21/2022 [69815.691221] RIP: 0010:strnlen+0x12/0x30 [69815.691225] Code: 48 83 c0 01 80 38 00 75 f7 48 29 f8 c3 31 c0 c3 0f 1f 84 00 00 00 00 00 48 8d 14 37 48 85 f6 48 89 f8 75 0b eb 19 48 83 c0 01 <48> 39 c2 74 09 80 38 00 75 f2 48 29 f8 c3 48 89 d0 48 29 f8 c3 31 [69815.691227] RSP: 0018:ffffb2d10bab3d20 EFLAGS: 00000296 [69815.691229] RAX: ffff96cc13441730 RBX: ffff96cc1344172c RCX: 00000000000000ff [69815.691230] RDX: ffff96cc1344180c RSI: 00000000000000e0 RDI: ffff96cc1344172c [69815.691231] RBP: 0000000000000039 R08: 0000000000000034 R09: 00000000000000ff [69815.691232] R10: ffff96c7174db240 R11: 0000000000000039 R12: 000000000000003f [69815.691233] R13: ffff96cc13441000 R14: ffffb2d10bab3e30 R15: 0000000000000039 [69815.691234] FS: 00007f07daa03740(0000) GS:ffff96cc46680000(0000) knlGS:0000000000000000 [69815.691235] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [69815.691236] CR2: 00005558841be002 CR3: 000000011222b006 CR4: 00000000003706f0 [69815.691237] Call Trace: [69815.691239] <IRQ> [69815.691240] ? rcu_dump_cpu_stacks+0xef/0x170 [69815.691244] ? rcu_sched_clock_irq+0x4ad/0xa60 [69815.691246] ? update_process_times+0x56/0x90 [69815.691248] ? tick_nohz_highres_handler+0x86/0xa0 [69815.691250] ? tick_sched_do_timer+0x80/0x80 [69815.691252] ? __hrtimer_run_queues+0xec/0x190 [69815.691254] ? hrtimer_interrupt+0xef/0x220 [69815.691256] ? __sysvec_apic_timer_interrupt+0x38/0x90 [69815.691259] ? sysvec_apic_timer_interrupt+0x69/0x90 [69815.691261] </IRQ> [69815.691262] <TASK> [69815.691262] ? asm_sysvec_apic_timer_interrupt+0x16/0x20 [69815.691265] ? strnlen+0x12/0x30 [69815.691268] afs_dir_iterate+0x5a3/0xb40 [kafs] [69815.691276] ? __mod_memcg_lruvec_state+0x7d/0xd0 [69815.691279] ? __mod_lruvec_page_state+0x76/0xf0 [69815.691281] ? folio_add_new_anon_rmap+0x3c/0xb0 [69815.691283] afs_readdir+0x2b/0x50 [kafs] [69815.691289] iterate_dir+0x75/0x120 [69815.691292] __x64_sys_getdents64+0x7f/0x120 [69815.691295] ? fillonedir+0x110/0x110 [69815.691297] do_syscall_64+0x56/0xd0 [69815.691299] ? __count_memcg_events+0x39/0x80 [69815.691301] ? handle_mm_fault+0x94/0x320 [69815.691303] ? exc_page_fault+0x33c/0x5c0 [69815.691306] entry_SYSCALL_64_after_hwframe+0x46/0x4e [69815.691311] RIP: 0033:0x7f07daac43a3 [69815.691313] Code: 89 df e8 d0 4b fc ff 48 83 c4 08 48 89 e8 5b 5d c3 66 0f 1f 44 00 00 b8 ff ff ff 7f 48 39 c2 48 0f 47 d0 b8 d9 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 51 5a 0f 00 f7 d8 [69815.691314] RSP: 002b:00007ffe0b56b1e8 EFLAGS: 00000293 ORIG_RAX: 00000000000000d9 [69815.691315] RAX: ffffffffffffffda RBX: 00005558841bd640 RCX: 00007f07daac43a3 [69815.691316] RDX: 0000000000008000 RSI: 00005558841bd670 RDI: 0000000000000003 [69815.691317] RBP: 00005558841bd670 R08: 0000000005f77196 R09: 0000000000000004 [69815.691318] R10: 0000000000000000 R11: 0000000000000293 R12: ffffffffffffff88 [69815.691319] R13: 0000000000000000 R14: 000055588340b093 R15: 00000000ffffffff [69815.691321] </TASK> [69911.272520] rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 1-.... } 18432 jiffies s: 26361 root: 0x2/. [69911.272529] rcu: blocking rcu_node structures (internal RCU debug): [69911.272531] Sending NMI from CPU 3 to CPUs 1: [69911.272535] NMI backtrace for cpu 1 [69911.272536] CPU: 1 PID: 132851 Comm: make Not tainted 6.7.4 #1 [69911.272539] Hardware name: LENOVO 20HF0047MX/20HF0047MX, BIOS N1WET68W (1.47 ) 07/21/2022 [69911.272539] RIP: 0010:afs_dir_iterate+0x58e/0xb40 [kafs] [69911.272552] Code: e8 67 6b d0 e0 e9 2a fb ff ff be 01 00 00 00 4c 89 e7 e8 55 6b d0 e0 e9 18 fb ff ff 44 89 fd be 40 00 00 00 48 89 e8 48 29 ee <48> c1 e0 05 48 c1 e6 05 49 8d 5c 05 0c 48 89 df e8 0d f1 03 e1 48 [69911.272553] RSP: 0018:ffffb2d10bab3d28 EFLAGS: 00000216 [69911.272555] RAX: 0000000000000025 RBX: ffff96cc1344146c RCX: 00000000000000ff [69911.272556] RDX: ffff96cc1344180c RSI: 000000000000001b RDI: 0000000000000001 [69911.272557] RBP: 0000000000000025 R08: 0000000000000025 R09: 00000000000000ff [69911.272558] R10: ffff96c7174db240 R11: 0000000000000025 R12: 000000000000003f [69911.272558] R13: ffff96cc13441000 R14: ffffb2d10bab3e30 R15: 0000000000000025 [69911.272559] FS: 00007f07daa03740(0000) GS:ffff96cc46680000(0000) knlGS:0000000000000000 [69911.272561] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [69911.272562] CR2: 00005558841be002 CR3: 000000011222b006 CR4: 00000000003706f0 [69911.272563] Call Trace: [69911.272565] <NMI> [69911.272566] ? nmi_cpu_backtrace+0x80/0xf0 [69911.272569] ? nmi_cpu_backtrace_handler+0x8/0x10 [69911.272571] ? nmi_handle+0x4b/0xe0 [69911.272573] ? default_do_nmi+0x40/0x210 [69911.272576] ? exc_nmi+0xeb/0x110 [69911.272577] ? end_repeat_nmi+0xf/0x18 [69911.272579] ? afs_dir_iterate+0x58e/0xb40 [kafs] [69911.272587] ? afs_dir_iterate+0x58e/0xb40 [kafs] [69911.272595] ? afs_dir_iterate+0x58e/0xb40 [kafs] [69911.272602] </NMI> [69911.272603] <TASK> [69911.272604] ? __mod_memcg_lruvec_state+0x7d/0xd0 [69911.272606] ? __mod_lruvec_page_state+0x76/0xf0 [69911.272609] ? folio_add_new_anon_rmap+0x3c/0xb0 [69911.272611] afs_readdir+0x2b/0x50 [kafs] [69911.272618] iterate_dir+0x75/0x120 [69911.272622] __x64_sys_getdents64+0x7f/0x120 [69911.272624] ? fillonedir+0x110/0x110 [69911.272626] do_syscall_64+0x56/0xd0 [69911.272628] ? __count_memcg_events+0x39/0x80 [69911.272631] ? handle_mm_fault+0x94/0x320 [69911.272633] ? exc_page_fault+0x33c/0x5c0 [69911.272635] entry_SYSCALL_64_after_hwframe+0x46/0x4e [69911.272638] RIP: 0033:0x7f07daac43a3 [69911.272639] Code: 89 df e8 d0 4b fc ff 48 83 c4 08 48 89 e8 5b 5d c3 66 0f 1f 44 00 00 b8 ff ff ff 7f 48 39 c2 48 0f 47 d0 b8 d9 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 51 5a 0f 00 f7 d8 [69911.272641] RSP: 002b:00007ffe0b56b1e8 EFLAGS: 00000293 ORIG_RAX: 00000000000000d9 [69911.272642] RAX: ffffffffffffffda RBX: 00005558841bd640 RCX: 00007f07daac43a3 [69911.272643] RDX: 0000000000008000 RSI: 00005558841bd670 RDI: 0000000000000003 [69911.272644] RBP: 00005558841bd670 R08: 0000000005f77196 R09: 0000000000000004 [69911.272645] R10: 0000000000000000 R11: 0000000000000293 R12: ffffffffffffff88 [69911.272646] R13: 0000000000000000 R14: 000055588340b093 R15: 00000000ffffffff etc....
From: Jeffrey Altman This commit is the fix for the mentioned issue. commit fe92f874f09145a6951deacaa4961390238bbe0d Author: Michael Lass <bevan@bi-co.net> Date: Wed Jan 31 16:52:20 2024 +0100 net: Fix from address in memcpy_to_iter_csum() While inlining csum_and_memcpy() into memcpy_to_iter_csum(), the from address passed to csum_partial_copy_nocheck() was accidentally changed. This causes a regression in applications using UDP, as for example OpenAFS, causing loss of datagrams.
This is almost certainly not the actual fix. The actual fix is probably this: https://lore.kernel.org/linux-fsdevel/786185.1708694102@warthog.procyon.org.uk/T/#u If a directory has a block with only ".__afsXXXX" files in it (from uncompleted silly-rename), these .__afsXXXX files are skipped but without advancing the file position in the dir_context. This leads to afs_dir_iterate() repeating the block again and again. Fix this by making the code that skips the .__afsXXXX file also manually advance the file position.