Bug 215943
Summary: | UBSAN: array-index-out-of-bounds in drivers/scsi/megaraid/megaraid_sas_fp.c:103:32 | ||
---|---|---|---|
Product: | IO/Storage | Reporter: | christian.d.dietrich |
Component: | SCSI | Assignee: | linux-scsi (linux-scsi) |
Status: | NEW --- | ||
Severity: | normal | CC: | charlotte, darren.armstrong85, devzero, eliastorres, fnbrier, gustavo, kees, torbjorn, ubuntologic |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 5.15.27 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
drivers: scsi: megaraid: fix ldSpanMap array declarations
dmesg with UBSAN traces |
Description
christian.d.dietrich
2022-05-05 13:03:09 UTC
Created attachment 300986 [details]
drivers: scsi: megaraid: fix ldSpanMap array declarations
It looks like ldSpanMap arrays are being declared with a length of 1 whilst the accompanying ldTgtIdToLd lookup is set up using max limits.
This looks to be quite old code (2010) which makes me a bit suspicious that I've missed something about how this works. But I couldn't find anything in the current source or commit logs to explain why it was this way. So it looks like an honest oversight from what I can tell.
I've attached a patch that matches lengths between ldSpanMap and ldTgtIdToLd in the two cases I was able to identify. Is it possible to test with this patch applied?
Created attachment 301055 [details]
dmesg with UBSAN traces
we're seeing a similar thing on ubuntu 22.04's 5.15-based kernel (attached kernel log).
MR_DRV_RAID_MAP ends with a single "struct MR_LD_SPAN_MAP ldSpanMap[1]", but in MR_DRV_RAID_MAP_ALL, it is always followed by the field "struct MR_LD_SPAN_MAP ldSpanMap[MAX_LOGICAL_DRIVES_DYN - 1]". Even though the access looks like it's going off the end, the attached backtraces are accessing MR_DRV_RAID_MAP_ALL's ldSpanMap.
So the attached traces are arguably false positives, but drivers/scsi/megaraid is using an unusual idiom.
i assume if it did "struct MR_LD_SPAN_MAP ldSpanMap[0]", it would not trigger the warning? but also it seems like in most (all?) of these cases it has access to the MR_DRV_RAID_MAP_ALL anyways. (MR_FW_RAID_MAP and MR_FW_RAID_MAP_ALL seem to be in a similar situation, but I didn't look at it as closely).
That makes a *bit* more sense. Is there something special about the zero-th entry which allows for treating MR_DRV_RAID_MAP_ALL as a MR_DRV_RAID_MAP some of the time? Something like that would explain why this code is set up in this way and has persisted in this way for so long. i have started seeing exactly the same problem in the log as well. i only noticed after the controller for some odd reason gave up and the kernel obviously started complaining about all my sas disks and not able to access them. after reboot i checked dmesg and then found this UBSAN array-index-out-of-bounds in my case i a 9361-8i and a Supermicro C7X99-OCE-F/C7X99-OCE-F, BIOS 2.1a 06/15/2018 mainboard. i'm running kernel 5.15.35 the UBSAN messages happened directly at boot That makes sense, the code path I saw this on was loading state from firmware. I guess we need to resolve whether/how we can safely "spill over" into the array in this manner. As above there's some assumptions implicitly behind made about the layout of the ldTgtIdToLd map that the zero-th entry in ldSpanMap is special. V3 and, hopefully, the final one. :) https://lore.kernel.org/linux-hardening/cover.1660592640.git.gustavoars@kernel.org/ (In reply to Gustavo A. R. Silva from comment #7) > V3 and, hopefully, the final one. :) > > https://lore.kernel.org/linux-hardening/cover.1660592640.git. > gustavoars@kernel.org/ JFYI this patch series has been taken by the scsi/megaraid/ maintainers: https://lore.kernel.org/linux-hardening/yq1k06z8vaw.fsf@ca-mkp.ca.oracle.com/ is this a serious issue or only cosmetic one? getting unsure because of "i only noticed after the controller for some odd reason gave up and the kernel obviously started complaining about all my sas disks and not able to access them." i put our 3108 based megaraid controller into jbod personality mode because i'm using the drives with proxmox/zfs (see exceprt from the manual at https://forum.proxmox.com/threads/megaraid-personality-mode.93857/ ) when booting, i get this message, but it seems to work fine. in raid personality mode, this doesn't happen. should i better keep the controller in raid mode (which jbod enabled) until kernel the patch is in proxmox - or can/should i simply ignore this errors? > is this a serious issue or only cosmetic one?
the original bug report is cosmetic. the linked patch set is not a functional change, it fixes the cosmetic issue.
there might be other issues in this driver (which should go in another bug i guess?). but if you're seeing the same backtraces then you're probably just having the cosmetic issue.
ATOM BIOS: CAPILANO radeon 0000:02:00.0: VRAM: 1024M 0x0000000000000000 - 0x000000003FFFFFFF (1024M used) radeon 0000:02:00.0: GTT: 1024M 0x0000000040000000 - 0x000000007FFFFFFF [drm] Detected VRAM RAM=1024M, BAR=256M [drm] RAM width 128bits DDR [drm] radeon: 1024M of VRAM memory ready [drm] radeon: 1024M of GTT memory ready. [drm] Loading REDWOOD Microcode [drm] Internal thermal controller without fan control ================================================================================ UBSAN: array-index-out-of-bounds in /home/kernel/COD/linux/drivers/gpu/drm/radeon/radeon_atombios.c:2620:43 index 1 is out of range for type 'UCHAR [1]' CPU: 2 PID: 140 Comm: systemd-udevd Not tainted 6.5.1-060501-generic #202309020842 Hardware name: Acer Aspire 7741/JE70_CP, BIOS V1.26 04/28/2011 Call Trace: <TASK> dump_stack_lvl+0x48/0x70 dump_stack+0x10/0x20 __ubsan_handle_out_of_bounds+0xc6/0x110 radeon_atombios_parse_power_table_4_5+0x3c9/0x3f0 [radeon] radeon_atombios_get_power_modes+0x205/0x210 [radeon] radeon_pm_init_dpm+0x8e/0x2f0 [radeon] radeon_pm_init+0xd0/0x100 [radeon] evergreen_init+0x158/0x400 [radeon] radeon_device_init+0x540/0xa90 [radeon] radeon_driver_load_kms+0xcc/0x2f0 [radeon] drm_dev_register+0x10e/0x240 [drm] radeon_pci_probe+0xec/0x180 [radeon] local_pci_probe+0x47/0xb0 pci_call_probe+0x55/0x190 pci_device_probe+0x84/0x120 really_probe+0x1c7/0x410 __driver_probe_device+0x8c/0x180 driver_probe_device+0x24/0xd0 __driver_attach+0x10b/0x210 ? __pfx___driver_attach+0x10/0x10 bus_for_each_dev+0x8d/0xf0 driver_attach+0x1e/0x30 bus_add_driver+0x127/0x240 driver_register+0x5e/0x130 ? __pfx_radeon_module_init+0x10/0x10 [radeon] __pci_register_driver+0x62/0x70 __pci_register_driver+0x62/0x70 radeon_module_init+0x4c/0xff0 [radeon] do_one_initcall+0x5e/0x340 do_init_module+0x68/0x260 load_module+0xba1/0xcf0 ? ima_post_read_file+0xe8/0x110 ? security_kernel_post_read_file+0x75/0x90 init_module_from_file+0x96/0x100 ? init_module_from_file+0x96/0x100 idempotent_init_module+0x11c/0x2b0 __x64_sys_finit_module+0x64/0xd0 do_syscall_64+0x5c/0x90 ? do_syscall_64+0x68/0x90 ? syscall_exit_to_user_mode+0x37/0x60 ? do_syscall_64+0x68/0x90 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 RIP: 0033:0x7fe86023089d Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 0> RSP: 002b:00007fffe4257b08 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 RAX: ffffffffffffffda RBX: 0000560b8b980a10 RCX: 00007fe86023089d RDX: 0000000000000000 RSI: 00007fe8603b0458 RDI: 0000000000000013 RBP: 00007fe8603b0458 R08: 0000000000000000 R09: 00007fffe4257c30 R10: 0000000000000013 R11: 0000000000000246 R12: 0000000000020000 R13: 0000560b8b978140 R14: 0000000000000000 R15: 0000560b8b90faf0 </TASK> ================================================================================ that's an array-out-of-bounds issue, yes. but it doesn't look that it's related to megaraid issue, reported in this ticket. guess you googled for array-out-of-bounds and found this one. ubsan is generic infrastructure for improving code/kernel quality: https://www.kernel.org/doc/html/v4.19/dev-tools/ubsan.html so, please post bugreport with proper assignment/subsystem |