Created attachment 287357 [details] dmesg (5.6.0-rc1 + v2 Fix DSI and ISI... patch, PowerMac G4 DP) [...] Feb 13 20:18:53 T600 kernel: ================================================================== Feb 13 20:18:53 T600 kernel: BUG: KASAN: stack-out-of-bounds in test_bit+0x30/0x44 Feb 13 20:18:53 T600 kernel: Read of size 4 at addr ee8bddac by task systemd/1 Feb 13 20:18:53 T600 kernel: Feb 13 20:18:53 T600 kernel: CPU: 0 PID: 1 Comm: systemd Tainted: G W 5.6.0-rc1-PowerMacG4+ #20 Feb 13 20:18:53 T600 kernel: Call Trace: Feb 13 20:18:53 T600 kernel: [ee8bdc38] [c078cf18] dump_stack+0xbc/0x118 (unreliable) Feb 13 20:18:53 T600 kernel: [ee8bdc68] [c0249f94] print_address_description.isra.0+0x3c/0x420 Feb 13 20:18:53 T600 kernel: [ee8bdcf8] [c024a554] __kasan_report+0x138/0x180 Feb 13 20:18:53 T600 kernel: [ee8bdd38] [c0249718] kasan_report+0x7c/0x104 Feb 13 20:18:53 T600 kernel: [ee8bdd58] [c06526b4] test_bit+0x30/0x44 Feb 13 20:18:53 T600 kernel: [ee8bdd78] [c0657c6c] netlink_bind+0x24c/0x33c Feb 13 20:18:53 T600 kernel: [ee8bde18] [c05c0c3c] __sys_bind+0xd4/0x120 Feb 13 20:18:53 T600 kernel: [ee8bdf38] [c001a278] ret_from_syscall+0x0/0x34 Feb 13 20:18:53 T600 kernel: --- interrupt: c01 at 0x4f3ea8 LR = 0x8f5b80 Feb 13 20:18:53 T600 kernel: Feb 13 20:18:53 T600 kernel: The buggy address belongs to the page: Feb 13 20:18:53 T600 kernel: page:ef460a94 refcount:0 mapcount:0 mapping:00000000 index:0x0 Feb 13 20:18:53 T600 kernel: flags: 0x0() Feb 13 20:18:53 T600 kernel: raw: 00000000 ef460a98 ef460a98 00000000 00000000 00000000 ffffffff 00000000 Feb 13 20:18:53 T600 kernel: raw: 00000000 Feb 13 20:18:53 T600 kernel: page dumped because: kasan: bad access detected Feb 13 20:18:53 T600 kernel: Feb 13 20:18:53 T600 kernel: addr ee8bddac is located in stack of task systemd/1 at offset 36 in frame: Feb 13 20:18:53 T600 kernel: netlink_bind+0x0/0x33c Feb 13 20:18:53 T600 kernel: Feb 13 20:18:53 T600 kernel: this frame has 1 object: Feb 13 20:18:53 T600 kernel: [32, 36) 'groups' Feb 13 20:18:53 T600 kernel: Feb 13 20:18:53 T600 kernel: Memory state around the buggy address: Feb 13 20:18:53 T600 kernel: ee8bdc80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Feb 13 20:18:53 T600 kernel: ee8bdd00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Feb 13 20:18:53 T600 kernel: >ee8bdd80: 00 f1 f1 f1 f1 04 f3 f3 f3 00 00 00 00 00 00 00 Feb 13 20:18:53 T600 kernel: ^ Feb 13 20:18:53 T600 kernel: ee8bde00: 00 00 00 00 00 f1 f1 f1 f1 04 f2 04 f2 00 00 00 Feb 13 20:18:53 T600 kernel: ee8bde80: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3 Feb 13 20:18:53 T600 kernel: ================================================================== Happens on my G4 DP with kernel 5.6.0-rc1 and KASAN enabled (outline) during boot. kernel is patched with Christophe's '[v2] powerpc/32s: Fix DSI and ISI exceptions for CONFIG_VMAP_STACK' (https://patchwork.ozlabs.org/patch/1237387/) but CONFIG_VMAP_STACK was not used here.
Created attachment 287359 [details] kernel .config (5.6.0-rc1, PowerMac G4 DP)
Probably a bug in or around netlink_bind() in net/netlink/af_netlink.c https://elixir.bootlin.com/linux/v5.6-rc1/source/net/netlink/af_netlink.c#L1017 Could you print the value of nlk->ngroups just before the loop which does the test_bit() ? It shall be 32 or less.
Bug introduced by commit ("cf5bddb95cbe net: bridge: vlan: add rtnetlink group and notify support") RTNLGRP_MAX is now 33. 'unsigned long groups' is 32 bits long on PPC32 Following loop in netlink_bind() overflows. for (group = 0; group < nlk->ngroups; group++) { if (!test_bit(group, &groups)) continue; err = nlk->netlink_bind(net, group + 1); if (!err) continue; netlink_undo_bind(group, groups, sk); goto unlock; } Should 'groups' be changes to 'unsigned long long' ?
Feedback from Nikolay: I think we can just cap these at min(BITS_PER_TYPE(u32), nlk->ngroups) since "groups" is coming from sockaddr_nl's "nl_groups" which is a u32, for any groups beyond u32 one has to use setsockopt().
That's not a PPC32 bug but a Network bug affecting all 32 bits architectures.
Note that the bug wasn't introduced by my commit, but instead has been there since: commit 4f520900522f Author: Richard Guy Briggs <rgb@redhat.com> Date: Tue Apr 22 21:31:54 2014 -0400 netlink: have netlink per-protocol bind function return an error code. which moved the ngroups test_bit() to a local variable. My commit only exposed the bug since it added the 33rd group. I'm currently preparing a fix and will post it to netdev after verifying and testing it.
Fix landed in 5.6-rc3, works now as expected. Thanks!