Bug 215943

Summary:	UBSAN: array-index-out-of-bounds in drivers/scsi/megaraid/megaraid_sas_fp.c:103:32
Product:	IO/Storage	Reporter:	christian.d.dietrich
Component:	SCSI	Assignee:	linux-scsi (linux-scsi)
Status:	NEW ---
Severity:	normal	CC:	charlotte, darren.armstrong85, devzero, eliastorres, fnbrier, gustavo, kees, torbjorn, ubuntologic
Priority:	P1
Hardware:	All
OS:	Linux
Kernel Version:	5.15.27	Subsystem:
Regression:	No	Bisected commit-id:
Attachments:	drivers: scsi: megaraid: fix ldSpanMap array declarations dmesg with UBSAN traces

Description christian.d.dietrich 2022-05-05 13:03:09 UTC

This bug also seems to affect other users / hardware: https://www.spinics.net/lists/kernel/msg4294764.html (H710P: LSI 2008 / H730 mini & H730P: LSI 3108)

Apart from the kernel message, everything seems to be working so far.

AVAGO MegaRAID SAS 9361-4i controller:

Basics :
======
Controller = 0
Model = AVAGO MegaRAID SAS 9361-4i
Serial Number = SK71088275
Current Controller Date/Time = 05/05/2022, 12:55:31
Current System Date/time = 05/05/2022, 14:55:30
SAS Address = 500605b00cd3ce20
PCI Address = 00:51:00:00
Mfg Date = 03/13/17
Rework Date = 00/00/00
Revision No = 12A


Version :
=======
Firmware Package Build = 24.21.0-0148
Firmware Version = 4.680.00-8555
CPLD Version = 26747-01A
Bios Version = 6.36.00.3_4.19.08.00_0x06180205
HII Version = 03.25.05.14
Ctrl-R Version = 5.19-0606
Preboot CLI Version = 01.07-05:#%0000
NVDATA Version = 3.1705.00-0024
Boot Block Version = 3.07.00.00-0004
Driver Name = megaraid_sas
Driver Version = 07.717.02.00-rc1

Kernel message:

================================================================================
UBSAN: array-index-out-of-bounds in /build/linux-HMZHpV/linux-5.15.0/drivers/scsi/megaraid/megaraid_sas_fp.c:103:32
index 1 is out of range for type 'MR_LD_SPAN_MAP [1]'
CPU: 41 PID: 268 Comm: kworker/41:0H Not tainted 5.15.0-27-generic #28-Ubuntu
Hardware name: Supermicro Super Server/H11DSU-iN, BIOS 1.3 07/15/2019
Workqueue: kblockd blk_mq_run_work_fn
Call Trace:
 <TASK>
 show_stack+0x52/0x58
 dump_stack_lvl+0x4a/0x5f
 dump_stack+0x10/0x12
 ubsan_epilogue+0x9/0x45
 __ubsan_handle_out_of_bounds.cold+0x44/0x49
 MR_BuildRaidContext+0xa5a/0xb50 [megaraid_sas]
 megasas_build_ldio_fusion+0x5b5/0x9a0 [megaraid_sas]
 megasas_build_io_fusion+0x40e/0x450 [megaraid_sas]
 megasas_build_and_issue_cmd_fusion+0xa5/0x370 [megaraid_sas]
 megasas_queue_command+0x1b5/0x1f0 [megaraid_sas]
 ? ktime_get+0x46/0xc0
 scsi_dispatch_cmd+0x93/0x1f0
 scsi_queue_rq+0x2d1/0x690
 blk_mq_dispatch_rq_list+0x126/0x600
 ? __sbitmap_queue_get+0x1/0x10
 __blk_mq_do_dispatch_sched+0xba/0x2d0
 ? ttwu_do_wakeup+0x1c/0x160
 __blk_mq_sched_dispatch_requests+0x104/0x150
 blk_mq_sched_dispatch_requests+0x35/0x60
 __blk_mq_run_hw_queue+0x34/0xb0
 blk_mq_run_work_fn+0x1b/0x20
 process_one_work+0x22b/0x3d0
 worker_thread+0x53/0x410
 ? process_one_work+0x3d0/0x3d0
 kthread+0x12a/0x150
 ? set_kthread_struct+0x50/0x50
 ret_from_fork+0x22/0x30
 </TASK>
================================================================================
================================================================================
UBSAN: array-index-out-of-bounds in /build/linux-HMZHpV/linux-5.15.0/drivers/scsi/megaraid/megaraid_sas_fp.c:103:32
index 1 is out of range for type 'MR_LD_SPAN_MAP [1]'
CPU: 41 PID: 268 Comm: kworker/41:0H Not tainted 5.15.0-27-generic #28-Ubuntu
Hardware name: Supermicro Super Server/H11DSU-iN, BIOS 1.3 07/15/2019
Workqueue: kblockd blk_mq_run_work_fn
Call Trace:
 <TASK>
 show_stack+0x52/0x58
 dump_stack_lvl+0x4a/0x5f
 dump_stack+0x10/0x12
 ubsan_epilogue+0x9/0x45
 __ubsan_handle_out_of_bounds.cold+0x44/0x49
 ? _printk+0x58/0x6f
 MR_GetPhyParams+0x3d9/0x700 [megaraid_sas]
 ? ubsan_epilogue+0x15/0x45
 MR_BuildRaidContext+0x402/0xb50 [megaraid_sas]
 megasas_build_ldio_fusion+0x5b5/0x9a0 [megaraid_sas]
 megasas_build_io_fusion+0x40e/0x450 [megaraid_sas]
 megasas_build_and_issue_cmd_fusion+0xa5/0x370 [megaraid_sas]
 megasas_queue_command+0x1b5/0x1f0 [megaraid_sas]
 ? ktime_get+0x46/0xc0
 scsi_dispatch_cmd+0x93/0x1f0
 scsi_queue_rq+0x2d1/0x690
 blk_mq_dispatch_rq_list+0x126/0x600
 ? __sbitmap_queue_get+0x1/0x10
 __blk_mq_do_dispatch_sched+0xba/0x2d0
 ? ttwu_do_wakeup+0x1c/0x160
 __blk_mq_sched_dispatch_requests+0x104/0x150
 blk_mq_sched_dispatch_requests+0x35/0x60
 __blk_mq_run_hw_queue+0x34/0xb0
 blk_mq_run_work_fn+0x1b/0x20
 process_one_work+0x22b/0x3d0
 worker_thread+0x53/0x410
 ? process_one_work+0x3d0/0x3d0
 kthread+0x12a/0x150
 ? set_kthread_struct+0x50/0x50
 ret_from_fork+0x22/0x30
 </TASK>
================================================================================
================================================================================
UBSAN: array-index-out-of-bounds in /build/linux-HMZHpV/linux-5.15.0/drivers/scsi/megaraid/megaraid_sas_fp.c:115:31
index 1 is out of range for type 'MR_LD_SPAN_MAP [1]'
CPU: 41 PID: 268 Comm: kworker/41:0H Not tainted 5.15.0-27-generic #28-Ubuntu
Hardware name: Supermicro Super Server/H11DSU-iN, BIOS 1.3 07/15/2019
Workqueue: kblockd blk_mq_run_work_fn
Call Trace:
 <TASK>
 show_stack+0x52/0x58
 dump_stack_lvl+0x4a/0x5f
 dump_stack+0x10/0x12
 ubsan_epilogue+0x9/0x45
 __ubsan_handle_out_of_bounds.cold+0x44/0x49
 ? _printk+0x58/0x6f
 MR_GetPhyParams+0x509/0x700 [megaraid_sas]
 MR_BuildRaidContext+0x402/0xb50 [megaraid_sas]
 megasas_build_ldio_fusion+0x5b5/0x9a0 [megaraid_sas]
 megasas_build_io_fusion+0x40e/0x450 [megaraid_sas]
 megasas_build_and_issue_cmd_fusion+0xa5/0x370 [megaraid_sas]
 megasas_queue_command+0x1b5/0x1f0 [megaraid_sas]
 ? ktime_get+0x46/0xc0
 scsi_dispatch_cmd+0x93/0x1f0
 scsi_queue_rq+0x2d1/0x690
 blk_mq_dispatch_rq_list+0x126/0x600
 ? __sbitmap_queue_get+0x1/0x10
 __blk_mq_do_dispatch_sched+0xba/0x2d0
 ? ttwu_do_wakeup+0x1c/0x160
 __blk_mq_sched_dispatch_requests+0x104/0x150
 blk_mq_sched_dispatch_requests+0x35/0x60
 __blk_mq_run_hw_queue+0x34/0xb0
 blk_mq_run_work_fn+0x1b/0x20
 process_one_work+0x22b/0x3d0
 worker_thread+0x53/0x410
 ? process_one_work+0x3d0/0x3d0
 kthread+0x12a/0x150
 ? set_kthread_struct+0x50/0x50
 ret_from_fork+0x22/0x30
 </TASK>
================================================================================
================================================================================
UBSAN: array-index-out-of-bounds in /build/linux-HMZHpV/linux-5.15.0/drivers/scsi/megaraid/megaraid_sas_fp.c:125:9
index 1 is out of range for type 'MR_LD_SPAN_MAP [1]'
CPU: 41 PID: 268 Comm: kworker/41:0H Not tainted 5.15.0-27-generic #28-Ubuntu
Hardware name: Supermicro Super Server/H11DSU-iN, BIOS 1.3 07/15/2019
Workqueue: kblockd blk_mq_run_work_fn
Call Trace:
 <TASK>
 show_stack+0x52/0x58
 dump_stack_lvl+0x4a/0x5f
 dump_stack+0x10/0x12
 ubsan_epilogue+0x9/0x45
 __ubsan_handle_out_of_bounds.cold+0x44/0x49
 ? _printk+0x58/0x6f
 MR_GetPhyParams+0x407/0x700 [megaraid_sas]
 MR_BuildRaidContext+0x402/0xb50 [megaraid_sas]
 megasas_build_ldio_fusion+0x5b5/0x9a0 [megaraid_sas]
 megasas_build_io_fusion+0x40e/0x450 [megaraid_sas]
 megasas_build_and_issue_cmd_fusion+0xa5/0x370 [megaraid_sas]
 megasas_queue_command+0x1b5/0x1f0 [megaraid_sas]
 ? ktime_get+0x46/0xc0
 scsi_dispatch_cmd+0x93/0x1f0
 scsi_queue_rq+0x2d1/0x690
 blk_mq_dispatch_rq_list+0x126/0x600
 ? __sbitmap_queue_get+0x1/0x10
 __blk_mq_do_dispatch_sched+0xba/0x2d0
 ? ttwu_do_wakeup+0x1c/0x160
 __blk_mq_sched_dispatch_requests+0x104/0x150
 blk_mq_sched_dispatch_requests+0x35/0x60
 __blk_mq_run_hw_queue+0x34/0xb0
 blk_mq_run_work_fn+0x1b/0x20
 process_one_work+0x22b/0x3d0
 worker_thread+0x53/0x410
 ? process_one_work+0x3d0/0x3d0
 kthread+0x12a/0x150
 ? set_kthread_struct+0x50/0x50
 ret_from_fork+0x22/0x30
 </TASK>
================================================================================
================================================================================
UBSAN: array-index-out-of-bounds in /build/linux-HMZHpV/linux-5.15.0/drivers/scsi/megaraid/megaraid_sas_fp.c:151:32
index 1 is out of range for type 'MR_LD_SPAN_MAP [1]'
CPU: 41 PID: 268 Comm: kworker/41:0H Not tainted 5.15.0-27-generic #28-Ubuntu
Hardware name: Supermicro Super Server/H11DSU-iN, BIOS 1.3 07/15/2019
Workqueue: kblockd blk_mq_run_work_fn
Call Trace:
 <TASK>
 show_stack+0x52/0x58
 dump_stack_lvl+0x4a/0x5f
 dump_stack+0x10/0x12
 ubsan_epilogue+0x9/0x45
 __ubsan_handle_out_of_bounds.cold+0x44/0x49
 ? _printk+0x58/0x6f
 MR_GetPhyParams+0x47f/0x700 [megaraid_sas]
 MR_BuildRaidContext+0x402/0xb50 [megaraid_sas]
 megasas_build_ldio_fusion+0x5b5/0x9a0 [megaraid_sas]
 megasas_build_io_fusion+0x40e/0x450 [megaraid_sas]
 megasas_build_and_issue_cmd_fusion+0xa5/0x370 [megaraid_sas]
 megasas_queue_command+0x1b5/0x1f0 [megaraid_sas]
 ? ktime_get+0x46/0xc0
 scsi_dispatch_cmd+0x93/0x1f0
 scsi_queue_rq+0x2d1/0x690
 blk_mq_dispatch_rq_list+0x126/0x600
 ? __sbitmap_queue_get+0x1/0x10
 __blk_mq_do_dispatch_sched+0xba/0x2d0
 ? ttwu_do_wakeup+0x1c/0x160
 __blk_mq_sched_dispatch_requests+0x104/0x150
 blk_mq_sched_dispatch_requests+0x35/0x60
 __blk_mq_run_hw_queue+0x34/0xb0
 blk_mq_run_work_fn+0x1b/0x20
 process_one_work+0x22b/0x3d0
 worker_thread+0x53/0x410
 ? process_one_work+0x3d0/0x3d0
 kthread+0x12a/0x150
 ? set_kthread_struct+0x50/0x50
 ret_from_fork+0x22/0x30
 </TASK>
================================================================================

Comment 1 darren.armstrong85 2022-05-18 01:10:01 UTC

Created attachment 300986 [details]
drivers: scsi: megaraid: fix ldSpanMap array declarations

It looks like ldSpanMap arrays are being declared with a length of 1 whilst the accompanying ldTgtIdToLd lookup is set up using max limits.

This looks to be quite old code (2010) which makes me a bit suspicious that I've missed something about how this works.  But I couldn't find anything in the current source or commit logs to explain why it was this way.  So it looks like an honest oversight from what I can tell.

I've attached a patch that matches lengths between ldSpanMap and ldTgtIdToLd in the two cases I was able to identify.  Is it possible to test with this patch applied?

Comment 2 charlotte 2022-05-27 01:04:14 UTC

Created attachment 301055 [details]
dmesg with UBSAN traces

we're seeing a similar thing on ubuntu 22.04's 5.15-based kernel (attached kernel log).

MR_DRV_RAID_MAP ends with a single "struct MR_LD_SPAN_MAP ldSpanMap[1]", but in MR_DRV_RAID_MAP_ALL, it is always followed by the field "struct MR_LD_SPAN_MAP ldSpanMap[MAX_LOGICAL_DRIVES_DYN - 1]". Even though the access looks like it's going off the end, the attached backtraces are accessing MR_DRV_RAID_MAP_ALL's ldSpanMap.

So the attached traces are arguably false positives, but drivers/scsi/megaraid is using an unusual idiom.

i assume if it did "struct MR_LD_SPAN_MAP ldSpanMap[0]", it would not trigger the warning? but also it seems like in most (all?) of these cases it has access to the MR_DRV_RAID_MAP_ALL anyways. (MR_FW_RAID_MAP and MR_FW_RAID_MAP_ALL seem to be in a similar situation, but I didn't look at it as closely).

Comment 3 darren.armstrong85 2022-05-27 20:41:14 UTC

That makes a *bit* more sense.  Is there something special about the zero-th entry which allows for treating MR_DRV_RAID_MAP_ALL as a MR_DRV_RAID_MAP some of the time?  Something like that would explain why this code is set up in this way and has persisted in this way for so long.

Comment 4 torbjorn 2022-06-08 05:36:03 UTC

i have started seeing exactly the same problem in the log as well.
i only noticed after the controller for some odd reason gave up and the kernel obviously started complaining about all my sas disks and not able to access them.

after reboot i checked dmesg and then found this UBSAN array-index-out-of-bounds

in my case i a 9361-8i and a Supermicro C7X99-OCE-F/C7X99-OCE-F, BIOS 2.1a 06/15/2018 mainboard.
i'm running kernel 5.15.35

the UBSAN messages happened directly at boot

Comment 5 darren.armstrong85 2022-06-08 06:39:26 UTC

That makes sense, the code path I saw this on was loading state from firmware.

I guess we need to resolve whether/how we can safely "spill over" into the array in this manner. As above there's some assumptions implicitly behind made about the layout of the ldTgtIdToLd map that the zero-th entry in ldSpanMap is special.

Comment 6 Kees Cook 2022-06-22 22:27:22 UTC

See:
https://lore.kernel.org/lkml/cover.1628136510.git.gustavoars@kernel.org

Comment 7 Gustavo A. R. Silva 2022-08-16 21:47:17 UTC

V3 and, hopefully, the final one. :) 

https://lore.kernel.org/linux-hardening/cover.1660592640.git.gustavoars@kernel.org/

Comment 8 Gustavo A. R. Silva 2022-08-24 20:26:09 UTC

(In reply to Gustavo A. R. Silva from comment #7)
> V3 and, hopefully, the final one. :) 
> 
> https://lore.kernel.org/linux-hardening/cover.1660592640.git.
> gustavoars@kernel.org/

JFYI this patch series has been taken by the scsi/megaraid/ maintainers:

https://lore.kernel.org/linux-hardening/yq1k06z8vaw.fsf@ca-mkp.ca.oracle.com/

Comment 9 Roland Kletzing 2022-11-10 22:22:11 UTC

is this a serious issue or only cosmetic one?  

getting unsure because of  "i only noticed after the controller for some odd reason gave up and the kernel obviously started complaining about all my sas disks and not able to access them."


i put our 3108 based megaraid controller into jbod personality mode because i'm using the drives with proxmox/zfs (see exceprt from the manual at https://forum.proxmox.com/threads/megaraid-personality-mode.93857/ ) 

when booting, i get this message, but it seems to work fine.  in raid personality mode, this doesn't happen.

should i better keep the controller in raid mode (which jbod enabled) until kernel the patch is in proxmox - or can/should i simply ignore this errors?

Comment 10 charlotte 2022-11-12 02:20:17 UTC

> is this a serious issue or only cosmetic one?

the original bug report is cosmetic. the linked patch set is not a functional change, it fixes the cosmetic issue.

there might be other issues in this driver (which should go in another bug i guess?). but if you're seeing the same backtraces then you're probably just having the cosmetic issue.

Comment 11 ubuntologic 2023-09-11 07:47:29 UTC

ATOM BIOS: CAPILANO
radeon 0000:02:00.0: VRAM: 1024M 0x0000000000000000 - 0x000000003FFFFFFF (1024M used)
radeon 0000:02:00.0: GTT: 1024M 0x0000000040000000 - 0x000000007FFFFFFF
[drm] Detected VRAM RAM=1024M, BAR=256M
[drm] RAM width 128bits DDR
[drm] radeon: 1024M of VRAM memory ready
[drm] radeon: 1024M of GTT memory ready.
[drm] Loading REDWOOD Microcode
[drm] Internal thermal controller without fan control
================================================================================
UBSAN: array-index-out-of-bounds in /home/kernel/COD/linux/drivers/gpu/drm/radeon/radeon_atombios.c:2620:43
index 1 is out of range for type 'UCHAR [1]'
CPU: 2 PID: 140 Comm: systemd-udevd Not tainted 6.5.1-060501-generic #202309020842
Hardware name: Acer Aspire 7741/JE70_CP, BIOS V1.26 04/28/2011
Call Trace:
 <TASK>
 dump_stack_lvl+0x48/0x70
 dump_stack+0x10/0x20
 __ubsan_handle_out_of_bounds+0xc6/0x110
 radeon_atombios_parse_power_table_4_5+0x3c9/0x3f0 [radeon]
 radeon_atombios_get_power_modes+0x205/0x210 [radeon]
 radeon_pm_init_dpm+0x8e/0x2f0 [radeon]
 radeon_pm_init+0xd0/0x100 [radeon]
 evergreen_init+0x158/0x400 [radeon]
 radeon_device_init+0x540/0xa90 [radeon]
 radeon_driver_load_kms+0xcc/0x2f0 [radeon]
 drm_dev_register+0x10e/0x240 [drm]
 radeon_pci_probe+0xec/0x180 [radeon]
 local_pci_probe+0x47/0xb0
 pci_call_probe+0x55/0x190
 pci_device_probe+0x84/0x120
 really_probe+0x1c7/0x410
 __driver_probe_device+0x8c/0x180
 driver_probe_device+0x24/0xd0
 __driver_attach+0x10b/0x210
 ? __pfx___driver_attach+0x10/0x10
 bus_for_each_dev+0x8d/0xf0
 driver_attach+0x1e/0x30
 bus_add_driver+0x127/0x240
 driver_register+0x5e/0x130
 ? __pfx_radeon_module_init+0x10/0x10 [radeon]
 __pci_register_driver+0x62/0x70
 __pci_register_driver+0x62/0x70
 radeon_module_init+0x4c/0xff0 [radeon]
 do_one_initcall+0x5e/0x340
 do_init_module+0x68/0x260
 load_module+0xba1/0xcf0
 ? ima_post_read_file+0xe8/0x110
 ? security_kernel_post_read_file+0x75/0x90
 init_module_from_file+0x96/0x100
 ? init_module_from_file+0x96/0x100
 idempotent_init_module+0x11c/0x2b0
 __x64_sys_finit_module+0x64/0xd0
 do_syscall_64+0x5c/0x90
 ? do_syscall_64+0x68/0x90
 ? syscall_exit_to_user_mode+0x37/0x60
 ? do_syscall_64+0x68/0x90
 entry_SYSCALL_64_after_hwframe+0x6e/0xd8
RIP: 0033:0x7fe86023089d
Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 0>
RSP: 002b:00007fffe4257b08 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 0000560b8b980a10 RCX: 00007fe86023089d
RDX: 0000000000000000 RSI: 00007fe8603b0458 RDI: 0000000000000013
RBP: 00007fe8603b0458 R08: 0000000000000000 R09: 00007fffe4257c30
R10: 0000000000000013 R11: 0000000000000246 R12: 0000000000020000
R13: 0000560b8b978140 R14: 0000000000000000 R15: 0000560b8b90faf0
 </TASK>
================================================================================

Comment 12 Roland Kletzing 2023-09-11 08:19:52 UTC

that's an array-out-of-bounds issue, yes. but it doesn't look that it's related to megaraid issue, reported in this ticket. 

guess you googled for array-out-of-bounds and found this one.

ubsan is generic infrastructure for improving code/kernel quality:

https://www.kernel.org/doc/html/v4.19/dev-tools/ubsan.html

so, please post bugreport with proper assignment/subsystem