This bug also seems to affect other users / hardware: https://www.spinics.net/lists/kernel/msg4294764.html (H710P: LSI 2008 / H730 mini & H730P: LSI 3108) Apart from the kernel message, everything seems to be working so far. AVAGO MegaRAID SAS 9361-4i controller: Basics : ====== Controller = 0 Model = AVAGO MegaRAID SAS 9361-4i Serial Number = SK71088275 Current Controller Date/Time = 05/05/2022, 12:55:31 Current System Date/time = 05/05/2022, 14:55:30 SAS Address = 500605b00cd3ce20 PCI Address = 00:51:00:00 Mfg Date = 03/13/17 Rework Date = 00/00/00 Revision No = 12A Version : ======= Firmware Package Build = 24.21.0-0148 Firmware Version = 4.680.00-8555 CPLD Version = 26747-01A Bios Version = 6.36.00.3_4.19.08.00_0x06180205 HII Version = 03.25.05.14 Ctrl-R Version = 5.19-0606 Preboot CLI Version = 01.07-05:#%0000 NVDATA Version = 3.1705.00-0024 Boot Block Version = 3.07.00.00-0004 Driver Name = megaraid_sas Driver Version = 07.717.02.00-rc1 Kernel message: ================================================================================ UBSAN: array-index-out-of-bounds in /build/linux-HMZHpV/linux-5.15.0/drivers/scsi/megaraid/megaraid_sas_fp.c:103:32 index 1 is out of range for type 'MR_LD_SPAN_MAP [1]' CPU: 41 PID: 268 Comm: kworker/41:0H Not tainted 5.15.0-27-generic #28-Ubuntu Hardware name: Supermicro Super Server/H11DSU-iN, BIOS 1.3 07/15/2019 Workqueue: kblockd blk_mq_run_work_fn Call Trace: <TASK> show_stack+0x52/0x58 dump_stack_lvl+0x4a/0x5f dump_stack+0x10/0x12 ubsan_epilogue+0x9/0x45 __ubsan_handle_out_of_bounds.cold+0x44/0x49 MR_BuildRaidContext+0xa5a/0xb50 [megaraid_sas] megasas_build_ldio_fusion+0x5b5/0x9a0 [megaraid_sas] megasas_build_io_fusion+0x40e/0x450 [megaraid_sas] megasas_build_and_issue_cmd_fusion+0xa5/0x370 [megaraid_sas] megasas_queue_command+0x1b5/0x1f0 [megaraid_sas] ? ktime_get+0x46/0xc0 scsi_dispatch_cmd+0x93/0x1f0 scsi_queue_rq+0x2d1/0x690 blk_mq_dispatch_rq_list+0x126/0x600 ? __sbitmap_queue_get+0x1/0x10 __blk_mq_do_dispatch_sched+0xba/0x2d0 ? ttwu_do_wakeup+0x1c/0x160 __blk_mq_sched_dispatch_requests+0x104/0x150 blk_mq_sched_dispatch_requests+0x35/0x60 __blk_mq_run_hw_queue+0x34/0xb0 blk_mq_run_work_fn+0x1b/0x20 process_one_work+0x22b/0x3d0 worker_thread+0x53/0x410 ? process_one_work+0x3d0/0x3d0 kthread+0x12a/0x150 ? set_kthread_struct+0x50/0x50 ret_from_fork+0x22/0x30 </TASK> ================================================================================ ================================================================================ UBSAN: array-index-out-of-bounds in /build/linux-HMZHpV/linux-5.15.0/drivers/scsi/megaraid/megaraid_sas_fp.c:103:32 index 1 is out of range for type 'MR_LD_SPAN_MAP [1]' CPU: 41 PID: 268 Comm: kworker/41:0H Not tainted 5.15.0-27-generic #28-Ubuntu Hardware name: Supermicro Super Server/H11DSU-iN, BIOS 1.3 07/15/2019 Workqueue: kblockd blk_mq_run_work_fn Call Trace: <TASK> show_stack+0x52/0x58 dump_stack_lvl+0x4a/0x5f dump_stack+0x10/0x12 ubsan_epilogue+0x9/0x45 __ubsan_handle_out_of_bounds.cold+0x44/0x49 ? _printk+0x58/0x6f MR_GetPhyParams+0x3d9/0x700 [megaraid_sas] ? ubsan_epilogue+0x15/0x45 MR_BuildRaidContext+0x402/0xb50 [megaraid_sas] megasas_build_ldio_fusion+0x5b5/0x9a0 [megaraid_sas] megasas_build_io_fusion+0x40e/0x450 [megaraid_sas] megasas_build_and_issue_cmd_fusion+0xa5/0x370 [megaraid_sas] megasas_queue_command+0x1b5/0x1f0 [megaraid_sas] ? ktime_get+0x46/0xc0 scsi_dispatch_cmd+0x93/0x1f0 scsi_queue_rq+0x2d1/0x690 blk_mq_dispatch_rq_list+0x126/0x600 ? __sbitmap_queue_get+0x1/0x10 __blk_mq_do_dispatch_sched+0xba/0x2d0 ? ttwu_do_wakeup+0x1c/0x160 __blk_mq_sched_dispatch_requests+0x104/0x150 blk_mq_sched_dispatch_requests+0x35/0x60 __blk_mq_run_hw_queue+0x34/0xb0 blk_mq_run_work_fn+0x1b/0x20 process_one_work+0x22b/0x3d0 worker_thread+0x53/0x410 ? process_one_work+0x3d0/0x3d0 kthread+0x12a/0x150 ? set_kthread_struct+0x50/0x50 ret_from_fork+0x22/0x30 </TASK> ================================================================================ ================================================================================ UBSAN: array-index-out-of-bounds in /build/linux-HMZHpV/linux-5.15.0/drivers/scsi/megaraid/megaraid_sas_fp.c:115:31 index 1 is out of range for type 'MR_LD_SPAN_MAP [1]' CPU: 41 PID: 268 Comm: kworker/41:0H Not tainted 5.15.0-27-generic #28-Ubuntu Hardware name: Supermicro Super Server/H11DSU-iN, BIOS 1.3 07/15/2019 Workqueue: kblockd blk_mq_run_work_fn Call Trace: <TASK> show_stack+0x52/0x58 dump_stack_lvl+0x4a/0x5f dump_stack+0x10/0x12 ubsan_epilogue+0x9/0x45 __ubsan_handle_out_of_bounds.cold+0x44/0x49 ? _printk+0x58/0x6f MR_GetPhyParams+0x509/0x700 [megaraid_sas] MR_BuildRaidContext+0x402/0xb50 [megaraid_sas] megasas_build_ldio_fusion+0x5b5/0x9a0 [megaraid_sas] megasas_build_io_fusion+0x40e/0x450 [megaraid_sas] megasas_build_and_issue_cmd_fusion+0xa5/0x370 [megaraid_sas] megasas_queue_command+0x1b5/0x1f0 [megaraid_sas] ? ktime_get+0x46/0xc0 scsi_dispatch_cmd+0x93/0x1f0 scsi_queue_rq+0x2d1/0x690 blk_mq_dispatch_rq_list+0x126/0x600 ? __sbitmap_queue_get+0x1/0x10 __blk_mq_do_dispatch_sched+0xba/0x2d0 ? ttwu_do_wakeup+0x1c/0x160 __blk_mq_sched_dispatch_requests+0x104/0x150 blk_mq_sched_dispatch_requests+0x35/0x60 __blk_mq_run_hw_queue+0x34/0xb0 blk_mq_run_work_fn+0x1b/0x20 process_one_work+0x22b/0x3d0 worker_thread+0x53/0x410 ? process_one_work+0x3d0/0x3d0 kthread+0x12a/0x150 ? set_kthread_struct+0x50/0x50 ret_from_fork+0x22/0x30 </TASK> ================================================================================ ================================================================================ UBSAN: array-index-out-of-bounds in /build/linux-HMZHpV/linux-5.15.0/drivers/scsi/megaraid/megaraid_sas_fp.c:125:9 index 1 is out of range for type 'MR_LD_SPAN_MAP [1]' CPU: 41 PID: 268 Comm: kworker/41:0H Not tainted 5.15.0-27-generic #28-Ubuntu Hardware name: Supermicro Super Server/H11DSU-iN, BIOS 1.3 07/15/2019 Workqueue: kblockd blk_mq_run_work_fn Call Trace: <TASK> show_stack+0x52/0x58 dump_stack_lvl+0x4a/0x5f dump_stack+0x10/0x12 ubsan_epilogue+0x9/0x45 __ubsan_handle_out_of_bounds.cold+0x44/0x49 ? _printk+0x58/0x6f MR_GetPhyParams+0x407/0x700 [megaraid_sas] MR_BuildRaidContext+0x402/0xb50 [megaraid_sas] megasas_build_ldio_fusion+0x5b5/0x9a0 [megaraid_sas] megasas_build_io_fusion+0x40e/0x450 [megaraid_sas] megasas_build_and_issue_cmd_fusion+0xa5/0x370 [megaraid_sas] megasas_queue_command+0x1b5/0x1f0 [megaraid_sas] ? ktime_get+0x46/0xc0 scsi_dispatch_cmd+0x93/0x1f0 scsi_queue_rq+0x2d1/0x690 blk_mq_dispatch_rq_list+0x126/0x600 ? __sbitmap_queue_get+0x1/0x10 __blk_mq_do_dispatch_sched+0xba/0x2d0 ? ttwu_do_wakeup+0x1c/0x160 __blk_mq_sched_dispatch_requests+0x104/0x150 blk_mq_sched_dispatch_requests+0x35/0x60 __blk_mq_run_hw_queue+0x34/0xb0 blk_mq_run_work_fn+0x1b/0x20 process_one_work+0x22b/0x3d0 worker_thread+0x53/0x410 ? process_one_work+0x3d0/0x3d0 kthread+0x12a/0x150 ? set_kthread_struct+0x50/0x50 ret_from_fork+0x22/0x30 </TASK> ================================================================================ ================================================================================ UBSAN: array-index-out-of-bounds in /build/linux-HMZHpV/linux-5.15.0/drivers/scsi/megaraid/megaraid_sas_fp.c:151:32 index 1 is out of range for type 'MR_LD_SPAN_MAP [1]' CPU: 41 PID: 268 Comm: kworker/41:0H Not tainted 5.15.0-27-generic #28-Ubuntu Hardware name: Supermicro Super Server/H11DSU-iN, BIOS 1.3 07/15/2019 Workqueue: kblockd blk_mq_run_work_fn Call Trace: <TASK> show_stack+0x52/0x58 dump_stack_lvl+0x4a/0x5f dump_stack+0x10/0x12 ubsan_epilogue+0x9/0x45 __ubsan_handle_out_of_bounds.cold+0x44/0x49 ? _printk+0x58/0x6f MR_GetPhyParams+0x47f/0x700 [megaraid_sas] MR_BuildRaidContext+0x402/0xb50 [megaraid_sas] megasas_build_ldio_fusion+0x5b5/0x9a0 [megaraid_sas] megasas_build_io_fusion+0x40e/0x450 [megaraid_sas] megasas_build_and_issue_cmd_fusion+0xa5/0x370 [megaraid_sas] megasas_queue_command+0x1b5/0x1f0 [megaraid_sas] ? ktime_get+0x46/0xc0 scsi_dispatch_cmd+0x93/0x1f0 scsi_queue_rq+0x2d1/0x690 blk_mq_dispatch_rq_list+0x126/0x600 ? __sbitmap_queue_get+0x1/0x10 __blk_mq_do_dispatch_sched+0xba/0x2d0 ? ttwu_do_wakeup+0x1c/0x160 __blk_mq_sched_dispatch_requests+0x104/0x150 blk_mq_sched_dispatch_requests+0x35/0x60 __blk_mq_run_hw_queue+0x34/0xb0 blk_mq_run_work_fn+0x1b/0x20 process_one_work+0x22b/0x3d0 worker_thread+0x53/0x410 ? process_one_work+0x3d0/0x3d0 kthread+0x12a/0x150 ? set_kthread_struct+0x50/0x50 ret_from_fork+0x22/0x30 </TASK> ================================================================================
Created attachment 300986 [details] drivers: scsi: megaraid: fix ldSpanMap array declarations It looks like ldSpanMap arrays are being declared with a length of 1 whilst the accompanying ldTgtIdToLd lookup is set up using max limits. This looks to be quite old code (2010) which makes me a bit suspicious that I've missed something about how this works. But I couldn't find anything in the current source or commit logs to explain why it was this way. So it looks like an honest oversight from what I can tell. I've attached a patch that matches lengths between ldSpanMap and ldTgtIdToLd in the two cases I was able to identify. Is it possible to test with this patch applied?
Created attachment 301055 [details] dmesg with UBSAN traces we're seeing a similar thing on ubuntu 22.04's 5.15-based kernel (attached kernel log). MR_DRV_RAID_MAP ends with a single "struct MR_LD_SPAN_MAP ldSpanMap[1]", but in MR_DRV_RAID_MAP_ALL, it is always followed by the field "struct MR_LD_SPAN_MAP ldSpanMap[MAX_LOGICAL_DRIVES_DYN - 1]". Even though the access looks like it's going off the end, the attached backtraces are accessing MR_DRV_RAID_MAP_ALL's ldSpanMap. So the attached traces are arguably false positives, but drivers/scsi/megaraid is using an unusual idiom. i assume if it did "struct MR_LD_SPAN_MAP ldSpanMap[0]", it would not trigger the warning? but also it seems like in most (all?) of these cases it has access to the MR_DRV_RAID_MAP_ALL anyways. (MR_FW_RAID_MAP and MR_FW_RAID_MAP_ALL seem to be in a similar situation, but I didn't look at it as closely).
That makes a *bit* more sense. Is there something special about the zero-th entry which allows for treating MR_DRV_RAID_MAP_ALL as a MR_DRV_RAID_MAP some of the time? Something like that would explain why this code is set up in this way and has persisted in this way for so long.
i have started seeing exactly the same problem in the log as well. i only noticed after the controller for some odd reason gave up and the kernel obviously started complaining about all my sas disks and not able to access them. after reboot i checked dmesg and then found this UBSAN array-index-out-of-bounds in my case i a 9361-8i and a Supermicro C7X99-OCE-F/C7X99-OCE-F, BIOS 2.1a 06/15/2018 mainboard. i'm running kernel 5.15.35 the UBSAN messages happened directly at boot
That makes sense, the code path I saw this on was loading state from firmware. I guess we need to resolve whether/how we can safely "spill over" into the array in this manner. As above there's some assumptions implicitly behind made about the layout of the ldTgtIdToLd map that the zero-th entry in ldSpanMap is special.
See: https://lore.kernel.org/lkml/cover.1628136510.git.gustavoars@kernel.org
V3 and, hopefully, the final one. :) https://lore.kernel.org/linux-hardening/cover.1660592640.git.gustavoars@kernel.org/
(In reply to Gustavo A. R. Silva from comment #7) > V3 and, hopefully, the final one. :) > > https://lore.kernel.org/linux-hardening/cover.1660592640.git. > gustavoars@kernel.org/ JFYI this patch series has been taken by the scsi/megaraid/ maintainers: https://lore.kernel.org/linux-hardening/yq1k06z8vaw.fsf@ca-mkp.ca.oracle.com/
is this a serious issue or only cosmetic one? getting unsure because of "i only noticed after the controller for some odd reason gave up and the kernel obviously started complaining about all my sas disks and not able to access them." i put our 3108 based megaraid controller into jbod personality mode because i'm using the drives with proxmox/zfs (see exceprt from the manual at https://forum.proxmox.com/threads/megaraid-personality-mode.93857/ ) when booting, i get this message, but it seems to work fine. in raid personality mode, this doesn't happen. should i better keep the controller in raid mode (which jbod enabled) until kernel the patch is in proxmox - or can/should i simply ignore this errors?
> is this a serious issue or only cosmetic one? the original bug report is cosmetic. the linked patch set is not a functional change, it fixes the cosmetic issue. there might be other issues in this driver (which should go in another bug i guess?). but if you're seeing the same backtraces then you're probably just having the cosmetic issue.
ATOM BIOS: CAPILANO radeon 0000:02:00.0: VRAM: 1024M 0x0000000000000000 - 0x000000003FFFFFFF (1024M used) radeon 0000:02:00.0: GTT: 1024M 0x0000000040000000 - 0x000000007FFFFFFF [drm] Detected VRAM RAM=1024M, BAR=256M [drm] RAM width 128bits DDR [drm] radeon: 1024M of VRAM memory ready [drm] radeon: 1024M of GTT memory ready. [drm] Loading REDWOOD Microcode [drm] Internal thermal controller without fan control ================================================================================ UBSAN: array-index-out-of-bounds in /home/kernel/COD/linux/drivers/gpu/drm/radeon/radeon_atombios.c:2620:43 index 1 is out of range for type 'UCHAR [1]' CPU: 2 PID: 140 Comm: systemd-udevd Not tainted 6.5.1-060501-generic #202309020842 Hardware name: Acer Aspire 7741/JE70_CP, BIOS V1.26 04/28/2011 Call Trace: <TASK> dump_stack_lvl+0x48/0x70 dump_stack+0x10/0x20 __ubsan_handle_out_of_bounds+0xc6/0x110 radeon_atombios_parse_power_table_4_5+0x3c9/0x3f0 [radeon] radeon_atombios_get_power_modes+0x205/0x210 [radeon] radeon_pm_init_dpm+0x8e/0x2f0 [radeon] radeon_pm_init+0xd0/0x100 [radeon] evergreen_init+0x158/0x400 [radeon] radeon_device_init+0x540/0xa90 [radeon] radeon_driver_load_kms+0xcc/0x2f0 [radeon] drm_dev_register+0x10e/0x240 [drm] radeon_pci_probe+0xec/0x180 [radeon] local_pci_probe+0x47/0xb0 pci_call_probe+0x55/0x190 pci_device_probe+0x84/0x120 really_probe+0x1c7/0x410 __driver_probe_device+0x8c/0x180 driver_probe_device+0x24/0xd0 __driver_attach+0x10b/0x210 ? __pfx___driver_attach+0x10/0x10 bus_for_each_dev+0x8d/0xf0 driver_attach+0x1e/0x30 bus_add_driver+0x127/0x240 driver_register+0x5e/0x130 ? __pfx_radeon_module_init+0x10/0x10 [radeon] __pci_register_driver+0x62/0x70 __pci_register_driver+0x62/0x70 radeon_module_init+0x4c/0xff0 [radeon] do_one_initcall+0x5e/0x340 do_init_module+0x68/0x260 load_module+0xba1/0xcf0 ? ima_post_read_file+0xe8/0x110 ? security_kernel_post_read_file+0x75/0x90 init_module_from_file+0x96/0x100 ? init_module_from_file+0x96/0x100 idempotent_init_module+0x11c/0x2b0 __x64_sys_finit_module+0x64/0xd0 do_syscall_64+0x5c/0x90 ? do_syscall_64+0x68/0x90 ? syscall_exit_to_user_mode+0x37/0x60 ? do_syscall_64+0x68/0x90 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 RIP: 0033:0x7fe86023089d Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 0> RSP: 002b:00007fffe4257b08 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 RAX: ffffffffffffffda RBX: 0000560b8b980a10 RCX: 00007fe86023089d RDX: 0000000000000000 RSI: 00007fe8603b0458 RDI: 0000000000000013 RBP: 00007fe8603b0458 R08: 0000000000000000 R09: 00007fffe4257c30 R10: 0000000000000013 R11: 0000000000000246 R12: 0000000000020000 R13: 0000560b8b978140 R14: 0000000000000000 R15: 0000560b8b90faf0 </TASK> ================================================================================
that's an array-out-of-bounds issue, yes. but it doesn't look that it's related to megaraid issue, reported in this ticket. guess you googled for array-out-of-bounds and found this one. ubsan is generic infrastructure for improving code/kernel quality: https://www.kernel.org/doc/html/v4.19/dev-tools/ubsan.html so, please post bugreport with proper assignment/subsystem