The procfs stressor from the stress-ng project triggers a kernel BUG in the 5.10.0-rc1 kernel on multiple architectures. x86_64: [root@localhost stress-ng]# ./stress-ng --procfs 0 --timeout 60 stress-ng: info: [3031466] dispatching hogs: 4 procfs [ 974.088011] ICMPv6: process `stress-ng-procf' is using deprecated sysctl (syscall) net.ipv6.neigh.enp0s29u1u1u5.retrans_time - use net.ipv6.neigh.enp0s29u1u1u5.retrans_time_ms instead [ 984.137351] usercopy: Kernel memory exposure attempt detected from SLUB object 'kmalloc-128' (offset 127, size 3)! [ 984.148917] ------------[ cut here ]------------ [ 984.154089] kernel BUG at mm/usercopy.c:99! [ 984.158813] invalid opcode: 0000 [#1] SMP PTI [ 984.163771] CPU: 0 PID: 3031471 Comm: stress-ng-procf Tainted: G I 5.10.0-rc1 #1 [ 984.173483] Hardware name: IBM IBM System X3250 M4 -[2583AC1]-/00D3729, BIOS -[JQE158AUS-1.05]- 07/23/2013 [ 984.184260] RIP: 0010:usercopy_abort+0x74/0x76 [ 984.189219] Code: 67 5c 8b 51 48 0f 45 d6 49 c7 c3 73 f7 5f 8b 4c 89 d1 57 48 c7 c6 68 57 5e 8b 48 c7 c7 38 f8 5f 8b 49 0f 45 f3 e8 13 71 ff ff <0f> 0b 4c 89 e1 49 89 d8 44 89 ea 31 f6 48 29 c1 48 c7 c7 b5 f7 5f [ 984.210177] RSP: 0018:ffff9c1f007b3dc0 EFLAGS: 00010286 [ 984.216000] RAX: 0000000000000066 RBX: 0000000000000003 RCX: 0000000000000000 [ 984.223965] RDX: ffff911f37c27e20 RSI: ffff911f37c19050 RDI: ffff911f37c19050 [ 984.231929] RBP: ffff911e04cd1f82 R08: 0000000000000000 R09: 0000000000000000 [ 984.239893] R10: ffff9c1f007b3bf8 R11: ffffffff8bd711a8 R12: ffff911e04cd1f7f [ 984.247857] R13: 0000000000000001 R14: 0000000000000003 R15: ffff911e009b19c0 [ 984.255821] FS: 00007fbabb42b180(0000) GS:ffff911f37c00000(0000) knlGS:0000000000000000 [ 984.264915] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 984.271520] CR2: 00007fbabb7ef000 CR3: 000000014e296001 CR4: 00000000001706f0 [ 984.279683] Call Trace: [ 984.282581] __check_heap_object+0xe0/0x110 [ 984.287405] __check_object_size+0x136/0x150 [ 984.292347] proc_sys_call_handler+0x167/0x250 [ 984.297565] new_sync_read+0x108/0x180 [ 984.302082] vfs_read+0x174/0x1d0 [ 984.306126] ksys_read+0x58/0xd0 [ 984.310022] do_syscall_64+0x33/0x40 [ 984.314277] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 984.320201] RIP: 0033:0x7fbabb6099ac [ 984.324514] Code: ec 28 48 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 89 fc ff ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 34 44 89 c7 48 89 44 24 08 e8 bf fc ff ff 48 [ 984.346368] RSP: 002b:00007fff47397340 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ 984.355402] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fbabb6099ac [ 984.363971] RDX: 0000000000000060 RSI: 00007fff47397390 RDI: 0000000000000006 [ 984.372583] RBP: 0000000000000006 R08: 0000000000000000 R09: 0000000000000000 [ 984.381093] R10: 00000000000fa2b4 R11: 0000000000000246 R12: 0000000000000003 [ 984.389577] R13: 00007fff473a3630 R14: 0000000000001000 R15: 0000000000000060 [ 984.398087] Modules linked in: binfmt_misc rfkill sunrpc intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal mgag200 intel_powerclamp iTCO_wdt i2c_algo_bit coretemp intel_pmc_bxt cdc_ether gpio_ich drm_kms_helper iTCO_vendor_support usbnet mii cec rapl ipmi_ssif i2c_i801 intel_cstate e1000e intel_uncore ie31200_edac pcspkr ipmi_si i2c_smbus lpc_ich ipmi_devintf ipmi_msghandler drm ip_tables xfs crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ata_generic pata_acpi wmi [ 984.449316] ---[ end trace d44739bb135b1e63 ]--- [ 984.455360] RIP: 0010:usercopy_abort+0x74/0x76 [ 984.461181] Code: 67 5c 8b 51 48 0f 45 d6 49 c7 c3 73 f7 5f 8b 4c 89 d1 57 48 c7 c6 68 57 5e 8b 48 c7 c7 38 f8 5f 8b 49 0f 45 f3 e8 13 71 ff ff <0f> 0b 4c 89 e1 49 89 d8 44 89 ea 31 f6 48 29 c1 48 c7 c7 b5 f7 5f [ 984.483379] RSP: 0018:ffff9c1f007b3dc0 EFLAGS: 00010286 [ 984.489416] RAX: 0000000000000066 RBX: 0000000000000003 RCX: 0000000000000000 [ 984.497965] RDX: ffff911f37c27e20 RSI: ffff911f37c19050 RDI: ffff911f37c19050 [ 984.507102] RBP: ffff911e04cd1f82 R08: 0000000000000000 R09: 0000000000000000 [ 984.515588] R10: ffff9c1f007b3bf8 R11: ffffffff8bd711a8 R12: ffff911e04cd1f7f [ 984.524474] R13: 0000000000000001 R14: 0000000000000003 R15: ffff911e009b19c0 [ 984.532878] FS: 00007fbabb42b180(0000) GS:ffff911f37c00000(0000) knlGS:0000000000000000 [ 984.542084] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 984.548804] CR2: 00007fbabb7ef000 CR3: 000000014e296001 CR4: 00000000001706f0 aarch64 (arm64): [root@localhost stress-ng]# ./stress-ng --procfs 0 stress-ng: info: [44802] defaulting to a 86400 second (1 day, 0.00 secs) run per stressor stress-ng: info: [44802] dispatching hogs: 32 procfs stress-ng: info: [44802] cache allocate: using defaults, can't determine cache details from sysfs [ 2934.501319] usercopy: Kernel memory exposure attempt detected from SLUB object 'kmalloc-128' (offset 82, size 73)! [ 2934.516649] ------------[ cut here ]------------ [ 2934.524448] kernel BUG at mm/usercopy.c:99! [ 2934.532208] Internal error: Oops - BUG: 0 [#1] SMP [ 2934.539950] Modules linked in: rfkill sunrpc nicvf cavium_ptp joydev nicpf cavium_rng_vf thunder_bgx thunder_xcv mdio_thunder cavium_rng mdio_cavium thunderx_edac ipmi_ssif ipmi_devintf ipmi_msghandler vfat fat ip_tables xfs ast i2c_algo_bit drm_vram_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec drm_ttm_helper ttm crct10dif_ce drm ghash_ce gpio_keys i2c_thunderx thunderx_mmc aes_neon_bs [ 2934.540737] usercopy: Kernel memory exposure attempt detected from SLUB object 'kmalloc-128' (offset 55, size 108)! [ 2934.550255] usercopy: Kernel memory exposure attempt detected from SLUB object 'kmalloc-128' (offset 86, size 68)! [ 2934.550297] ------------[ cut here ]------------ [ 2934.550300] kernel BUG at mm/usercopy.c:99! [ 2934.589488] CPU: 6 PID: 44874 Comm: stress-ng-procf Not tainted 5.10.0-rc1 #1 [ 2934.589492] Hardware name: GIGABYTE R120-T34-00/MT30-GS2-00, BIOS F02 08/06/2019 [ 2934.589497] pstate: 40400005 (nZcv daif +PAN -UAO -TCO BTYPE=--) [ 2934.589507] pc : usercopy_abort+0x98/0x9c [ 2934.589511] lr : usercopy_abort+0x98/0x9c [ 2934.589518] sp : ffff80007e14bc70 [ 2934.603274] ------------[ cut here ]------------ [ 2934.616904] x29: ffff80007e14bc80 x28: ffff00013a6d2a80 [ 2934.624799] kernel BUG at mm/usercopy.c:99! [ 2934.707971] [ 2934.707977] x27: 0000000000000000 x26: 0000000000000000 [ 2934.721063] x25: ffff80007e14bd30 x24: 0000000000000000 [ 2934.729652] x23: ffff000101e8a540 x22: ffff000149809a9b [ 2934.738181] x21: 0000000000000001 x20: 0000000000000049 [ 2934.746646] x19: ffff000149809a52 x18: 0000000000000000 [ 2934.755018] x17: 0000000000000000 x16: 0000000000000000 [ 2934.763363] x15: 0000000000aaaaaa x14: 0000000000000020 [ 2934.771682] x13: 00000000000117ca x12: ffff8000120bbe00 [ 2934.780114] x11: 0000000000000003 x10: ffff80001208be18 [ 2934.788496] x9 : ffff8000102310c0 x8 : ffff80001208bdc0 [ 2934.796849] x7 : 0000000000000001 x6 : 0000000000000000 [ 2934.804954] x5 : 0000000000000000 x4 : ffff000ff63af410 [ 2934.813101] x3 : ffff000ff63be340 x2 : ffff000ff63af410 [ 2934.821102] x1 : ffff00013a6d2a80 x0 : 0000000000000066 [ 2934.829282] Call trace: [ 2934.834432] usercopy_abort+0x98/0x9c [ 2934.840907] __check_heap_object+0x124/0x138 [ 2934.847889] __check_object_size+0x190/0x210 [ 2934.854815] proc_sys_call_handler+0x154/0x220 [ 2934.861877] proc_sys_read+0x1c/0x28 [ 2934.868165] new_sync_read+0xdc/0x158 [ 2934.874521] vfs_read+0x150/0x1e0 [ 2934.880382] ksys_read+0x60/0xe8 [ 2934.886226] __arm64_sys_read+0x24/0x30 [ 2934.892799] el0_svc_common.constprop.0+0xac/0x1e0 [ 2934.900356] do_el0_svc+0x2c/0x98 [ 2934.906348] el0_sync_handler+0xb0/0xb8 [ 2934.912797] el0_sync+0x178/0x180 [ 2934.918675] Code: aa0003e3 f0002620 911f8000 97fff564 (d4210000) [ 2934.927269] ---[ end trace ed6d63c40907130f ]---
s390x [ 2169.264889] usercopy: Kernel memory exposure attempt detected from SLUB object 'kmalloc-32' (offset 2, size 32)! [ 2169.264913] ------------[ cut here ]------------ [ 2169.264915] kernel BUG at mm/usercopy.c:99! [ 2169.264978] monitor event: 0040 ilc:2 [#1] SMP [ 2169.264984] Modules linked in: zfcp scsi_transport_fc rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc qeth_l2 vmur vfio_ccw vfio_mdev mdev vfio_iommu_type1 zcrypt_cex4 vfio drm drm_panel_orientation_quirks backlight i2c_core ip_tables xfs crc32_vx_s390 ghash_s390 prng aes_s390 des_s390 libdes sha512_s390 sha256_s390 sha1_s390 sha_common dasd_fba_mod lcs dasd_eckd_mod dasd_mod ctcm qeth qdio ccwgroup fsm pkey zcrypt [ 2169.265023] CPU: 0 PID: 29975 Comm: stress-ng-procf Kdump: loaded Not tainted 5.9.0 #1 [ 2169.265026] Hardware name: IBM 2964 N96 400 (z/VM 6.4.0) [ 2169.265030] Krnl PSW : 0704e00180000000 0000000008b47cfa (usercopy_abort+0xaa/0xb0) [ 2169.265040] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3 [ 2169.266838] Krnl GPRS: 0000000000000074 00000000091b4ce8 0000000000000064 0000000075002a08 [ 2169.266840] 0000000075011800 0000000000000000 0000000008da3f8c 000003ffdb1f8780 [ 2169.266842] 0000000000000002 0000000000000054 0000000000000001 0000000008d9367e [ 2169.266843] 000000002ce22000 0000000009049bd0 0000000008b47cf6 000003e00599fbf8 [ 2169.266852] Krnl Code: 0000000008b47cea: c0200012e188 larl %r2,0000000008da3ffa [ 2169.266852] 0000000008b47cf0: c0e5ffffec68 brasl %r14,0000000008b455c0 [ 2169.266852] #0000000008b47cf6: af000000 mc 0,0 [ 2169.266852] >0000000008b47cfa: 0707 bcr 0,%r7 [ 2169.266852] 0000000008b47cfc: 0707 bcr 0,%r7 [ 2169.266852] 0000000008b47cfe: 0707 bcr 0,%r7 [ 2169.266852] 0000000008b47d00: c00400000000 brcl 0,0000000008b47d00 [ 2169.266852] 0000000008b47d06: eb5ff0400024 stmg %r5,%r15,64(%r15) [ 2169.266867] Call Trace: [ 2169.266876] [<0000000008b47cfa>] usercopy_abort+0xaa/0xb0 [ 2169.266878] ([<0000000008b47cf6>] usercopy_abort+0xa6/0xb0) [ 2169.266883] [<00000000083790a0>] __check_heap_object+0x128/0x140 [ 2169.266885] [<00000000083976cc>] __check_object_size+0x134/0x1f8 [ 2169.266889] [<0000000008467296>] proc_sys_call_handler+0x126/0x220 [ 2169.266891] [<00000000084673ee>] proc_sys_read+0x26/0x38 [ 2169.266895] [<000000000839fa0c>] vfs_read+0x94/0x190 [ 2169.266897] [<000000000839fd80>] ksys_read+0x68/0xf8 [ 2169.266899] [<0000000008b5c994>] system_call+0xe0/0x2b0 [ 2169.266900] Last Breaking-Event-Address: [ 2169.267481] [<0000000008b5dbb0>] __s390_indirect_jump_r14+0x0/0xc [ 2169.267486] ---[ end trace a234fa69d7c0afcf ]---
It seems the bug might be related to locks. If I boot with maxcpus=1, the bug goes away. [root@localhost stress-ng]# uname -r 5.10.0-rc1 [root@localhost stress-ng]# lscpu | grep On-line On-line CPU(s) list: 0 [root@localhost stress-ng]# ./stress-ng --procfs 0 --timeout 15 stress-ng: info: [27068] dispatching hogs: 4 procfs stress-ng: info: [27068] successful run completed in 15.02s [root@localhost stress-ng]# ./stress-ng --procfs 0 --timeout 15 stress-ng: info: [27093] dispatching hogs: 4 procfs stress-ng: info: [27093] successful run completed in 15.03s [root@localhost stress-ng]# ./stress-ng --procfs 0 --timeout 15 stress-ng: info: [27114] dispatching hogs: 4 procfs stress-ng: info: [27114] successful run completed in 15.26s [root@localhost stress-ng]# grubby --remove-args='maxcpus=1' --update-kernel=DEFAULT [root@localhost stress-ng]# reboot ... ... [root@localhost stress-ng]# lscpu | grep On-line On-line CPU(s) list: 0-3 [root@localhost stress-ng]# ./stress-ng --procfs 0 --timeout 15 stress-ng: info: [914] dispatching hogs: 4 procfs [ 92.042429] usercopy: Kernel memory exposure attempt detected from SLUB object 'kmalloc-128' (offset 93, size 108)! [ 92.054189] ------------[ cut here ]------------ [ 92.059370] kernel BUG at mm/usercopy.c:99! ... ...
(switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Wed, 28 Oct 2020 15:49:15 +0000 bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=209919 > > Bug ID: 209919 > Summary: kernel BUG at mm/usercopy.c:99 from stress-ng procfs > Product: Memory Management > Version: 2.5 > Kernel Version: 5.10.0-rc1 > Hardware: All > OS: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: Other > Assignee: akpm@linux-foundation.org > Reporter: jbastian@redhat.com > Regression: No > > The procfs stressor from the stress-ng project triggers a kernel BUG in the > 5.10.0-rc1 kernel on multiple architectures. Thanks. A question from Kees, below... > x86_64: > > [root@localhost stress-ng]# ./stress-ng --procfs 0 --timeout 60 > stress-ng: info: [3031466] dispatching hogs: 4 procfs > [ 974.088011] ICMPv6: process `stress-ng-procf' is using deprecated sysctl > (syscall) net.ipv6.neigh.enp0s29u1u1u5.retrans_time - use > net.ipv6.neigh.enp0s29u1u1u5.retrans_time_ms instead > [ 984.137351] usercopy: Kernel memory exposure attempt detected from SLUB > object 'kmalloc-128' (offset 127, size 3)! > [ 984.148917] ------------[ cut here ]------------ > [ 984.154089] kernel BUG at mm/usercopy.c:99! > [ 984.158813] invalid opcode: 0000 [#1] SMP PTI > [ 984.163771] CPU: 0 PID: 3031471 Comm: stress-ng-procf Tainted: G > I > 5.10.0-rc1 #1 > [ 984.173483] Hardware name: IBM IBM System X3250 M4 -[2583AC1]-/00D3729, > BIOS > -[JQE158AUS-1.05]- 07/23/2013 > [ 984.184260] RIP: 0010:usercopy_abort+0x74/0x76 > [ 984.189219] Code: 67 5c 8b 51 48 0f 45 d6 49 c7 c3 73 f7 5f 8b 4c 89 d1 57 > 48 c7 c6 68 57 5e 8b 48 c7 c7 38 f8 5f 8b 49 0f 45 f3 e8 13 71 ff ff <0f> 0b > 4c > 89 e1 49 89 d8 44 89 ea 31 f6 48 29 c1 48 c7 c7 b5 f7 5f > [ 984.210177] RSP: 0018:ffff9c1f007b3dc0 EFLAGS: 00010286 > [ 984.216000] RAX: 0000000000000066 RBX: 0000000000000003 RCX: > 0000000000000000 > [ 984.223965] RDX: ffff911f37c27e20 RSI: ffff911f37c19050 RDI: > ffff911f37c19050 > [ 984.231929] RBP: ffff911e04cd1f82 R08: 0000000000000000 R09: > 0000000000000000 > [ 984.239893] R10: ffff9c1f007b3bf8 R11: ffffffff8bd711a8 R12: > ffff911e04cd1f7f > [ 984.247857] R13: 0000000000000001 R14: 0000000000000003 R15: > ffff911e009b19c0 > [ 984.255821] FS: 00007fbabb42b180(0000) GS:ffff911f37c00000(0000) > knlGS:0000000000000000 > [ 984.264915] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 984.271520] CR2: 00007fbabb7ef000 CR3: 000000014e296001 CR4: > 00000000001706f0 > [ 984.279683] Call Trace: > [ 984.282581] __check_heap_object+0xe0/0x110 > [ 984.287405] __check_object_size+0x136/0x150 > [ 984.292347] proc_sys_call_handler+0x167/0x250 > [ 984.297565] new_sync_read+0x108/0x180 > [ 984.302082] vfs_read+0x174/0x1d0 > [ 984.306126] ksys_read+0x58/0xd0 > [ 984.310022] do_syscall_64+0x33/0x40 > [ 984.314277] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Can we determine which /proc/sys entries these are? > [ 984.320201] RIP: 0033:0x7fbabb6099ac > [ 984.324514] Code: ec 28 48 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 89 fc > ff ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 31 c0 0f 05 <48> 3d > 00 > f0 ff ff 77 34 44 89 c7 48 89 44 24 08 e8 bf fc ff ff 48 > [ 984.346368] RSP: 002b:00007fff47397340 EFLAGS: 00000246 ORIG_RAX: > 0000000000000000 > [ 984.355402] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: > 00007fbabb6099ac > [ 984.363971] RDX: 0000000000000060 RSI: 00007fff47397390 RDI: > 0000000000000006 > [ 984.372583] RBP: 0000000000000006 R08: 0000000000000000 R09: > 0000000000000000 > [ 984.381093] R10: 00000000000fa2b4 R11: 0000000000000246 R12: > 0000000000000003 > [ 984.389577] R13: 00007fff473a3630 R14: 0000000000001000 R15: > 0000000000000060 > [ 984.398087] Modules linked in: binfmt_misc rfkill sunrpc intel_rapl_msr > intel_rapl_common x86_pkg_temp_thermal mgag200 intel_powerclamp iTCO_wdt > i2c_algo_bit coretemp intel_pmc_bxt cdc_ether gpio_ich drm_kms_helper > iTCO_vendor_support usbnet mii cec rapl ipmi_ssif i2c_i801 intel_cstate > e1000e > intel_uncore ie31200_edac pcspkr ipmi_si i2c_smbus lpc_ich ipmi_devintf > ipmi_msghandler drm ip_tables xfs crct10dif_pclmul crc32_pclmul crc32c_intel > ghash_clmulni_intel ata_generic pata_acpi wmi > [ 984.449316] ---[ end trace d44739bb135b1e63 ]--- > [ 984.455360] RIP: 0010:usercopy_abort+0x74/0x76 > [ 984.461181] Code: 67 5c 8b 51 48 0f 45 d6 49 c7 c3 73 f7 5f 8b 4c 89 d1 57 > 48 c7 c6 68 57 5e 8b 48 c7 c7 38 f8 5f 8b 49 0f 45 f3 e8 13 71 ff ff <0f> 0b > 4c > 89 e1 49 89 d8 44 89 ea 31 f6 48 29 c1 48 c7 c7 b5 f7 5f > [ 984.483379] RSP: 0018:ffff9c1f007b3dc0 EFLAGS: 00010286 > [ 984.489416] RAX: 0000000000000066 RBX: 0000000000000003 RCX: > 0000000000000000 > [ 984.497965] RDX: ffff911f37c27e20 RSI: ffff911f37c19050 RDI: > ffff911f37c19050 > [ 984.507102] RBP: ffff911e04cd1f82 R08: 0000000000000000 R09: > 0000000000000000 > [ 984.515588] R10: ffff9c1f007b3bf8 R11: ffffffff8bd711a8 R12: > ffff911e04cd1f7f > [ 984.524474] R13: 0000000000000001 R14: 0000000000000003 R15: > ffff911e009b19c0 > [ 984.532878] FS: 00007fbabb42b180(0000) GS:ffff911f37c00000(0000) > knlGS:0000000000000000 > [ 984.542084] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 984.548804] CR2: 00007fbabb7ef000 CR3: 000000014e296001 CR4: > 00000000001706f0 > > > aarch64 (arm64): > > [root@localhost stress-ng]# ./stress-ng --procfs 0 > stress-ng: info: [44802] defaulting to a 86400 second (1 day, 0.00 secs) run > per stressor > stress-ng: info: [44802] dispatching hogs: 32 procfs > stress-ng: info: [44802] cache allocate: using defaults, can't determine > cache > details from sysfs > [ 2934.501319] usercopy: Kernel memory exposure attempt detected from SLUB > object 'kmalloc-128' (offset 82, size 73)! > [ 2934.516649] ------------[ cut here ]------------ > [ 2934.524448] kernel BUG at mm/usercopy.c:99! > [ 2934.532208] Internal error: Oops - BUG: 0 [#1] SMP > [ 2934.539950] Modules linked in: rfkill sunrpc nicvf cavium_ptp joydev nicpf > cavium_rng_vf thunder_bgx thunder_xcv mdio_thunder cavium_rng mdio_cavium > thunderx_edac ipmi_ssif ipmi_devintf ipmi_msghandler vfat fat ip_tables xfs > ast > i2c_algo_bit drm_vram_helper drm_kms_helper syscopyarea sysfillrect sysimgblt > fb_sys_fops cec drm_ttm_helper ttm crct10dif_ce drm ghash_ce gpio_keys > i2c_thunderx thunderx_mmc aes_neon_bs > [ 2934.540737] usercopy: Kernel memory exposure attempt detected from SLUB > object 'kmalloc-128' (offset 55, size 108)! > [ 2934.550255] usercopy: Kernel memory exposure attempt detected from SLUB > object 'kmalloc-128' (offset 86, size 68)! > [ 2934.550297] ------------[ cut here ]------------ > [ 2934.550300] kernel BUG at mm/usercopy.c:99! > [ 2934.589488] CPU: 6 PID: 44874 Comm: stress-ng-procf Not tainted 5.10.0-rc1 > #1 > [ 2934.589492] Hardware name: GIGABYTE R120-T34-00/MT30-GS2-00, BIOS F02 > 08/06/2019 > [ 2934.589497] pstate: 40400005 (nZcv daif +PAN -UAO -TCO BTYPE=--) > [ 2934.589507] pc : usercopy_abort+0x98/0x9c > [ 2934.589511] lr : usercopy_abort+0x98/0x9c > [ 2934.589518] sp : ffff80007e14bc70 > [ 2934.603274] ------------[ cut here ]------------ > [ 2934.616904] x29: ffff80007e14bc80 x28: ffff00013a6d2a80 > [ 2934.624799] kernel BUG at mm/usercopy.c:99! > [ 2934.707971] > [ 2934.707977] x27: 0000000000000000 x26: 0000000000000000 > [ 2934.721063] x25: ffff80007e14bd30 x24: 0000000000000000 > [ 2934.729652] x23: ffff000101e8a540 x22: ffff000149809a9b > [ 2934.738181] x21: 0000000000000001 x20: 0000000000000049 > [ 2934.746646] x19: ffff000149809a52 x18: 0000000000000000 > [ 2934.755018] x17: 0000000000000000 x16: 0000000000000000 > [ 2934.763363] x15: 0000000000aaaaaa x14: 0000000000000020 > [ 2934.771682] x13: 00000000000117ca x12: ffff8000120bbe00 > [ 2934.780114] x11: 0000000000000003 x10: ffff80001208be18 > [ 2934.788496] x9 : ffff8000102310c0 x8 : ffff80001208bdc0 > [ 2934.796849] x7 : 0000000000000001 x6 : 0000000000000000 > [ 2934.804954] x5 : 0000000000000000 x4 : ffff000ff63af410 > [ 2934.813101] x3 : ffff000ff63be340 x2 : ffff000ff63af410 > [ 2934.821102] x1 : ffff00013a6d2a80 x0 : 0000000000000066 > [ 2934.829282] Call trace: > [ 2934.834432] usercopy_abort+0x98/0x9c > [ 2934.840907] __check_heap_object+0x124/0x138 > [ 2934.847889] __check_object_size+0x190/0x210 > [ 2934.854815] proc_sys_call_handler+0x154/0x220 > [ 2934.861877] proc_sys_read+0x1c/0x28 > [ 2934.868165] new_sync_read+0xdc/0x158 > [ 2934.874521] vfs_read+0x150/0x1e0 > [ 2934.880382] ksys_read+0x60/0xe8 > [ 2934.886226] __arm64_sys_read+0x24/0x30 > [ 2934.892799] el0_svc_common.constprop.0+0xac/0x1e0 > [ 2934.900356] do_el0_svc+0x2c/0x98 > [ 2934.906348] el0_sync_handler+0xb0/0xb8 > [ 2934.912797] el0_sync+0x178/0x180 > [ 2934.918675] Code: aa0003e3 f0002620 911f8000 97fff564 (d4210000) > [ 2934.927269] ---[ end trace ed6d63c40907130f ]--- > > -- > You are receiving this mail because: > You are the assignee for the bug.
To see which file it is, I suggest adding the following to stress-ng: diff --git a/stress-procfs.c b/stress-procfs.c index 3ffda881..77010828 100644 --- a/stress-procfs.c +++ b/stress-procfs.c @@ -428,11 +428,13 @@ static void stress_proc_dir( ret = shim_pthread_spin_lock(&lock); if (!ret) { (void)stress_mk_filename(tmp, sizeof(tmp), path, d->d_name); + printf("FILE: %s\n", tmp); (void)shim_strlcpy(proc_path, tmp, sizeof(proc_path)); (void)shim_pthread_spin_unlock(&lock); stress_proc_rw(ctxt, loops); inc_counter(args); + sleep(1); } free(d); dlist[i] = NULL; It will slowly work through the /proc files and print out the name of the entry it is stressing. Be patient.
Seems that simple random sized reads causes an issue. Got a reproducer: #include <stdlib.h> #include <unistd.h> #include <fcntl.h> int main(void) { for (;;) { int fd; ssize_t i = 0; fd = open("/proc/sys/kernel/sched_domain/cpu0/domain0/flags", O_RDONLY); if (fd < 0) exit(1); while (i < (4096 * 4096)) { char buffer[4096]; ssize_t ret, sz = 1 + (random() % sizeof(buffer)); ret = read(fd, buffer, sz); if (ret < 0) break; if (ret < sz) break; i += sz; } (void)close(fd); } return 0; }
Guess it's down to the following commit: commit 5b9f8ff7b320a34af3dbcf04edb40d9b04f22f4a Author: Valentin Schneider <valentin.schneider@arm.com> Date: Mon Aug 17 12:29:52 2020 +0100 sched/debug: Output SD flag names rather than their values
Reverting commit 5b9f8ff7b320a34af3dbcf04edb40d9b04f22f4a and the crash does not occur.
+static int sd_ctl_doflags(struct ctl_table *table, int write, + void *buffer, size_t *lenp, loff_t *ppos) +{ + unsigned long flags = *(unsigned long *)table->data; + size_t data_size = 0; + size_t len = 0; + char *tmp; + int idx; + + if (write) + return 0; + + for_each_set_bit(idx, &flags, __SD_FLAG_CNT) { + char *name = sd_flag_debug[idx].name; + + /* Name plus whitespace */ + data_size += strlen(name) + 1; + } + + if (*ppos > data_size) { + *lenp = 0; + return 0; + } + + tmp = kcalloc(data_size + 1, sizeof(*tmp), GFP_KERNEL); + if (!tmp) + return -ENOMEM; + + for_each_set_bit(idx, &flags, __SD_FLAG_CNT) { + char *name = sd_flag_debug[idx].name; + + len += snprintf(tmp + len, strlen(name) + 2, "%s ", name); + } + + tmp += *ppos; + len -= *ppos; + + if (len > *lenp) + len = *lenp; + if (len) + memcpy(buffer, tmp, len); + if (len < *lenp) { + ((char *)buffer)[len] = '\n'; + len++; + } + + *lenp = len; + *ppos += len; + + kfree(tmp); This kfree looks suspect, tmp has been bumped earlier by *ppos, so we're passing a bogus tmp to kfree. + + return 0; +}
This seems to fix it, I'll send the fix upstream diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c index 0655524700d2..2357921580f9 100644 --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -251,7 +251,7 @@ static int sd_ctl_doflags(struct ctl_table *table, int write, unsigned long flags = *(unsigned long *)table->data; size_t data_size = 0; size_t len = 0; - char *tmp; + char *tmp, *buf; int idx; if (write) @@ -269,17 +269,17 @@ static int sd_ctl_doflags(struct ctl_table *table, int write, return 0; } - tmp = kcalloc(data_size + 1, sizeof(*tmp), GFP_KERNEL); - if (!tmp) + buf = kcalloc(data_size + 1, sizeof(*buf), GFP_KERNEL); + if (!buf) return -ENOMEM; for_each_set_bit(idx, &flags, __SD_FLAG_CNT) { char *name = sd_flag_debug[idx].name; - len += snprintf(tmp + len, strlen(name) + 2, "%s ", name); + len += snprintf(buf + len, strlen(name) + 2, "%s ", name); } - tmp += *ppos; + tmp = buf + *ppos; len -= *ppos; if (len > *lenp) @@ -294,7 +294,7 @@ static int sd_ctl_doflags(struct ctl_table *table, int write, *lenp = len; *ppos += len; - kfree(tmp); + kfree(buf); return 0;
On Wed, Oct 28, 2020 at 04:36:11PM -0700, Andrew Morton wrote: >(switched to email. Please respond via emailed reply-to-all, not via the >bugzilla web interface). > >On Wed, 28 Oct 2020 15:49:15 +0000 bugzilla-daemon@bugzilla.kernel.org wrote: >> [ 984.279683] Call Trace: >> [ 984.282581] __check_heap_object+0xe0/0x110 >> [ 984.287405] __check_object_size+0x136/0x150 >> [ 984.292347] proc_sys_call_handler+0x167/0x250 >> [ 984.297565] new_sync_read+0x108/0x180 >> [ 984.302082] vfs_read+0x174/0x1d0 >> [ 984.306126] ksys_read+0x58/0xd0 >> [ 984.310022] do_syscall_64+0x33/0x40 >> [ 984.314277] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > >Can we determine which /proc/sys entries these are? Colin Ian King narrowed it down to /proc/sys/kernel/sched_domain/cpu0/domain0/flags and commit 5b9f8ff7b320a34af3dbcf04edb40d9b04f22f4a. He has a proposed patch, too, that he'll be sending to the list soon. See https://bugzilla.kernel.org/show_bug.cgi?id=209919#c9 Thanks for the quick debugging, Colin!
https://lkml.org/lkml/2020/10/29/848
On Thu, Oct 29, 2020 at 10:22:51AM -0500, Jeffrey Bastian wrote: >He has a proposed patch, too, that he'll be sending to the list soon. https://lkml.org/lkml/2020/10/29/848