Bug 218708

Summary: Off-by-one vulnerability when reading data from the n_gsm module
Product: Linux Reporter: j51569436
Component: KernelAssignee: Virtual assignee for kernel bugs (linux-kernel)
Status: NEW ---    
Severity: high CC: carnil, daniel.starke, gregkh, jirislaby
Priority: P3    
Hardware: All   
OS: Linux   
Kernel Version: Subsystem:
Regression: No Bisected commit-id:
Attachments: poc.c for crash
config of kernel

Description j51569436 2024-04-11 01:56:38 UTC
An off-by-one vulnerability occurs in gsm0_receive and gsm1_receive. I'll focus on gsm0_receive for our discussion.


[1] : Write the value to gsm->buf, then increment gsm->count by 1. 
[2] : If gsm->count == gsm->len is reached, stop reading. 


Writing a value to a buffer and then checking its length is typical of off-by-one vulnerabilities. 

```c
static void gsm0_receive(struct gsm_mux *gsm, u8 c)
{
	unsigned int len;

	switch (gsm->state) {
...
	case GSM_DATA:		/* Data */
		gsm->buf[gsm->count++] = c;//[1]
		if (gsm->count == gsm->len) {//[2]
			/* Calculate final FCS for UI frames over all data */
			if ((gsm->control & ~PF) != UIH) {
				gsm->fcs = gsm_fcs_add_block(gsm->fcs, gsm->buf,
							     gsm->count);
			}
			gsm->state = GSM_FCS;
		}
		break;
	case GSM_FCS:		/* FCS follows the packet */
		gsm->fcs = gsm_fcs_add(gsm->fcs, c);
		gsm->state = GSM_SSOF;
		break;
	case GSM_SSOF:
		gsm->state = GSM_SEARCH;
		if (c == GSM0_SOF)
			gsm_queue(gsm);
		else
			gsm->bad_size++;
		break;
	default:
		pr_debug("%s: unhandled state: %d
", __func__, gsm->state);
		break;
	}
}
```

- `gsm->count == gsm->len` should be changed to `(gsm->count+1) == gsm->len`
Comment 1 j51569436 2024-04-11 01:57:49 UTC
BUG: KASAN: slab-out-of-bounds in gsm0_receive+0xa4b/0xaf0 drivers/tty/n_gsm.c:2915
Write of size 1 at addr ffff888018aa45dd by task kworker/u5:0/25

CPU: 0 PID: 25 Comm: kworker/u5:0 Not tainted 6.8.0-rc5-dirty #283
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Workqueue: events_unbound flush_to_ldisc
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0x37/0x50 lib/dump_stack.c:106
 print_address_description mm/kasan/report.c:377 [inline]
 print_report+0xc1/0x5e0 mm/kasan/report.c:488
 kasan_report+0xb0/0xe0 mm/kasan/report.c:601
 gsm0_receive+0xa4b/0xaf0 drivers/tty/n_gsm.c:2915
 gsmld_receive_buf+0x1d8/0x300 drivers/tty/n_gsm.c:3555
 tty_ldisc_receive_buf+0x149/0x180 drivers/tty/tty_buffer.c:390
 tty_port_default_receive_buf+0x72/0xa0 drivers/tty/tty_port.c:37
 receive_buf drivers/tty/tty_buffer.c:444 [inline]
 flush_to_ldisc+0x221/0x6f0 drivers/tty/tty_buffer.c:494
 process_one_work+0x677/0xf80 kernel/workqueue.c:2633
 process_scheduled_works kernel/workqueue.c:2706 [inline]
 worker_thread+0x7ea/0xff0 kernel/workqueue.c:2787
 kthread+0x281/0x350 kernel/kthread.c:388
 ret_from_fork+0x2c/0x70 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:242
 </TASK>

Allocated by task 1020649:
 kasan_save_stack+0x24/0x50 mm/kasan/common.c:47
 kasan_save_track+0x14/0x30 mm/kasan/common.c:68
 poison_kmalloc_redzone mm/kasan/common.c:372 [inline]
 __kasan_kmalloc+0xaa/0xb0 mm/kasan/common.c:389
 kmalloc include/linux/slab.h:590 [inline]
 gsm_alloc_mux drivers/tty/n_gsm.c:3220 [inline]
 gsmld_open+0x109/0xd30 drivers/tty/n_gsm.c:3634
 tty_ldisc_open+0x9f/0x120 drivers/tty/tty_ldisc.c:432
 tty_set_ldisc+0x2e7/0x680 drivers/tty/tty_ldisc.c:557
 tiocsetd drivers/tty/tty_io.c:2439 [inline]
 tty_ioctl+0x5c0/0x1590 drivers/tty/tty_io.c:2739
 vfs_ioctl fs/ioctl.c:51 [inline]
 __do_sys_ioctl fs/ioctl.c:871 [inline]
 __se_sys_ioctl fs/ioctl.c:857 [inline]
 __x64_sys_ioctl+0x12d/0x1a0 fs/ioctl.c:857
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xc0/0x250 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x6f/0x77

The buggy address belongs to the object at ffff888018aa4000
 which belongs to the cache kmalloc-2k of size 2048
The buggy address is located 0 bytes to the right of
 allocated 1501-byte region [ffff888018aa4000, ffff888018aa45dd)

The buggy address belongs to the physical page:
page:ffffea000062a800 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x18aa0
head:ffffea000062a800 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0xfff00000000840(slab|head|node=0|zone=1|lastcpupid=0x7ff)
page_type: 0xffffffff()
raw: 00fff00000000840 ffff888010c42000 dead000000000100 dead000000000122
raw: 0000000000000000 0000000000080008 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 1, tgid 1 (swapper/0), ts 3810864554, free_ts 3810057898
 set_page_owner include/linux/page_owner.h:31 [inline]
 post_alloc_hook+0x2dd/0x350 mm/page_alloc.c:1533
 prep_new_page mm/page_alloc.c:1540 [inline]
 get_page_from_freelist+0xd0a/0x3570 mm/page_alloc.c:3311
 __alloc_pages+0x21f/0x1f30 mm/page_alloc.c:4567
 __alloc_pages_node include/linux/gfp.h:238 [inline]
 alloc_pages_node include/linux/gfp.h:261 [inline]
 alloc_slab_page mm/slub.c:2190 [inline]
 allocate_slab mm/slub.c:2354 [inline]
 new_slab+0xd3/0x3b0 mm/slub.c:2407
 ___slab_alloc+0x679/0xba0 mm/slub.c:3540
 __slab_alloc mm/slub.c:3625 [inline]
 __slab_alloc_node mm/slub.c:3678 [inline]
 slab_alloc_node mm/slub.c:3850 [inline]
 kmalloc_trace+0x35c/0x3a0 mm/slub.c:4007
 kmalloc include/linux/slab.h:590 [inline]
 kzalloc include/linux/slab.h:711 [inline]
 cec_allocate_adapter+0x78/0x600 drivers/media/cec/core/cec-core.c:268
 vivid_cec_alloc_adap+0xcb/0x130 drivers/media/test-drivers/vivid/vivid-cec.c:323
 vivid_create_instance drivers/media/test-drivers/vivid/vivid-core.c:1926 [inline]
 vivid_probe drivers/media/test-drivers/vivid/vivid-core.c:2018 [inline]
 vivid_probe+0x45a9/0x8e70 drivers/media/test-drivers/vivid/vivid-core.c:2003
 platform_probe+0xd2/0x190 drivers/base/platform.c:1404
 call_driver_probe drivers/base/dd.c:579 [inline]
 really_probe+0x1cd/0xb30 drivers/base/dd.c:658
 __driver_probe_device+0x184/0x450 drivers/base/dd.c:800
 driver_probe_device+0x44/0x120 drivers/base/dd.c:830
 __driver_attach+0x1db/0x4a0 drivers/base/dd.c:1216
 bus_for_each_dev+0xec/0x170 drivers/base/bus.c:368
 bus_add_driver+0x299/0x570 drivers/base/bus.c:673
page last free pid 1 tgid 1 stack trace:
 create_dummy_stack mm/page_owner.c:69 [inline]
 register_dummy_stack+0x61/0xb0 mm/page_owner.c:75
 init_page_owner+0x25/0x950 mm/page_owner.c:93
 invoke_init_callbacks mm/page_ext.c:135 [inline]
 page_ext_init+0x51a/0x740 mm/page_ext.c:480
 mm_core_init+0x4e5/0x5e0 mm/mm_init.c:2788
Comment 2 gregkh 2024-04-11 04:39:13 UTC
On Thu, Apr 11, 2024 at 01:56:38AM +0000, bugzilla-daemon@kernel.org wrote:
> An off-by-one vulnerability occurs in gsm0_receive and gsm1_receive. I'll
> focus
> on gsm0_receive for our discussion.

Can you send a patch to the linux-serial@vger.kernel.org list to help
resolve this issue?
Comment 3 Daniel Starke 2024-04-12 05:47:51 UTC
Case gsm0_receive:
- gsm->len is built from the first and maybe also second length octet
- gsm->len is checked against gsm->mru (gsm->buf is allocated with mru + 1)
- gsm->count is reset to zero
- all conscious octets are added to the buffer, each increasing the next
  write position via gsm->count
- this finishes if gsm->count equals to gsm->len

Simple example using numbers:
- gsm->len = 2
- gsm->count = 0
- gsm->buf[0] = data
- gsm->count++ // 1
- gsm->buf[1] = data
- gsm->count++ // 2
- gsm->count == gsm->len == 2 -> end of data

This is more or less the same like:
for (int i = 0; i < len; i++) {
	buf[i] = data;
}

Hence, no issue. There can be no off-by-one visible to KASAN even if gsm->count
hits one past gsm->mru as gsm->buf was allocated with gsm->mru + 1.
Actually, performing the suggested change will discard the last valid data
octet in each frame.

Case gsm1_receive:
- gsm->count is reset to zero after reaching the control field of a frame
- all conscious octets are added to the buffer, each increasing the next
  write position via gsm->count
- this finishes if gsm->count gets one larger than gsm->mru or SOF was found
  (gsm->buf is allocated with mru + 1)
- the last octet is taken as fcs if SOF was found beforehand and the
  gsm->count gets decreased accordingly

For me this looks like another place corrupted our data.
Comment 4 Jiri Slaby 2024-04-12 05:56:45 UTC
(In reply to j51569436 from comment #1)
> BUG: KASAN: slab-out-of-bounds in gsm0_receive+0xa4b/0xaf0
> drivers/tty/n_gsm.c:2915
> Write of size 1 at addr ffff888018aa45dd by task kworker/u5:0/25
...
>  invoke_init_callbacks mm/page_ext.c:135 [inline]
>  page_ext_init+0x51a/0x740 mm/page_ext.c:480
>  mm_core_init+0x4e5/0x5e0 mm/mm_init.c:2788

Isn't there "Memory state around the buggy address:" part which you stripped away?
Comment 5 j51569436 2024-04-12 06:07:30 UTC
I apologize for my mistake. I have a different suggestion. 
In GSM_LEN1, check if gsm->len is greater than gsm->mru. But don't subtract gsm->len which is already added, this seems to be the real cause. 

```
	case GSM_LEN1:
		gsm->fcs = gsm_fcs_add(gsm->fcs, c);
		len = c;
		gsm->len |= len << 7;
		if (gsm->len > gsm->mru) {
			gsm->bad_size++;
			gsm->state = GSM_SEARCH;
			break;
		}
		gsm->count = 0;
		if (!gsm->len)
			gsm->state = GSM_FCS;
		else
			gsm->state = GSM_DATA;
		break;

```
Comment 6 j51569436 2024-04-12 06:09:01 UTC
Oh, sorry. Please ignore the issue with GSM_LEN1 that I just mentioned. I'll analyze it again and share my findings.
Comment 7 Jiri Slaby 2024-04-12 06:11:49 UTC
        if (c->mru > MAX_MRU || c->mtu > MAX_MTU)
                return -EINVAL;

c->mru is never constraint-checked AFAICS.
Comment 8 j51569436 2024-04-12 06:12:59 UTC
Below is the full report from syzkaller. 


==================================================================
BUG: KASAN: slab-out-of-bounds in gsm0_receive+0xa4b/0xaf0 drivers/tty/n_gsm.c:2915
Write of size 1 at addr ffff888018aa45dd by task kworker/u5:0/25

CPU: 0 PID: 25 Comm: kworker/u5:0 Not tainted 6.8.0-rc5-dirty #283
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Workqueue: events_unbound flush_to_ldisc
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0x37/0x50 lib/dump_stack.c:106
 print_address_description mm/kasan/report.c:377 [inline]
 print_report+0xc1/0x5e0 mm/kasan/report.c:488
 kasan_report+0xb0/0xe0 mm/kasan/report.c:601
 gsm0_receive+0xa4b/0xaf0 drivers/tty/n_gsm.c:2915
 gsmld_receive_buf+0x1d8/0x300 drivers/tty/n_gsm.c:3555
 tty_ldisc_receive_buf+0x149/0x180 drivers/tty/tty_buffer.c:390
 tty_port_default_receive_buf+0x72/0xa0 drivers/tty/tty_port.c:37
 receive_buf drivers/tty/tty_buffer.c:444 [inline]
 flush_to_ldisc+0x221/0x6f0 drivers/tty/tty_buffer.c:494
 process_one_work+0x677/0xf80 kernel/workqueue.c:2633
 process_scheduled_works kernel/workqueue.c:2706 [inline]
 worker_thread+0x7ea/0xff0 kernel/workqueue.c:2787
 kthread+0x281/0x350 kernel/kthread.c:388
 ret_from_fork+0x2c/0x70 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:242
 </TASK>

Allocated by task 1020649:
 kasan_save_stack+0x24/0x50 mm/kasan/common.c:47
 kasan_save_track+0x14/0x30 mm/kasan/common.c:68
 poison_kmalloc_redzone mm/kasan/common.c:372 [inline]
 __kasan_kmalloc+0xaa/0xb0 mm/kasan/common.c:389
 kmalloc include/linux/slab.h:590 [inline]
 gsm_alloc_mux drivers/tty/n_gsm.c:3220 [inline]
 gsmld_open+0x109/0xd30 drivers/tty/n_gsm.c:3634
 tty_ldisc_open+0x9f/0x120 drivers/tty/tty_ldisc.c:432
 tty_set_ldisc+0x2e7/0x680 drivers/tty/tty_ldisc.c:557
 tiocsetd drivers/tty/tty_io.c:2439 [inline]
 tty_ioctl+0x5c0/0x1590 drivers/tty/tty_io.c:2739
 vfs_ioctl fs/ioctl.c:51 [inline]
 __do_sys_ioctl fs/ioctl.c:871 [inline]
 __se_sys_ioctl fs/ioctl.c:857 [inline]
 __x64_sys_ioctl+0x12d/0x1a0 fs/ioctl.c:857
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xc0/0x250 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x6f/0x77

The buggy address belongs to the object at ffff888018aa4000
 which belongs to the cache kmalloc-2k of size 2048
The buggy address is located 0 bytes to the right of
 allocated 1501-byte region [ffff888018aa4000, ffff888018aa45dd)

The buggy address belongs to the physical page:
page:ffffea000062a800 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x18aa0
head:ffffea000062a800 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0xfff00000000840(slab|head|node=0|zone=1|lastcpupid=0x7ff)
page_type: 0xffffffff()
raw: 00fff00000000840 ffff888010c42000 dead000000000100 dead000000000122
raw: 0000000000000000 0000000000080008 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 1, tgid 1 (swapper/0), ts 3810864554, free_ts 3810057898
 set_page_owner include/linux/page_owner.h:31 [inline]
 post_alloc_hook+0x2dd/0x350 mm/page_alloc.c:1533
 prep_new_page mm/page_alloc.c:1540 [inline]
 get_page_from_freelist+0xd0a/0x3570 mm/page_alloc.c:3311
 __alloc_pages+0x21f/0x1f30 mm/page_alloc.c:4567
 __alloc_pages_node include/linux/gfp.h:238 [inline]
 alloc_pages_node include/linux/gfp.h:261 [inline]
 alloc_slab_page mm/slub.c:2190 [inline]
 allocate_slab mm/slub.c:2354 [inline]
 new_slab+0xd3/0x3b0 mm/slub.c:2407
 ___slab_alloc+0x679/0xba0 mm/slub.c:3540
 __slab_alloc mm/slub.c:3625 [inline]
 __slab_alloc_node mm/slub.c:3678 [inline]
 slab_alloc_node mm/slub.c:3850 [inline]
 kmalloc_trace+0x35c/0x3a0 mm/slub.c:4007
 kmalloc include/linux/slab.h:590 [inline]
 kzalloc include/linux/slab.h:711 [inline]
 cec_allocate_adapter+0x78/0x600 drivers/media/cec/core/cec-core.c:268
 vivid_cec_alloc_adap+0xcb/0x130 drivers/media/test-drivers/vivid/vivid-cec.c:323
 vivid_create_instance drivers/media/test-drivers/vivid/vivid-core.c:1926 [inline]
 vivid_probe drivers/media/test-drivers/vivid/vivid-core.c:2018 [inline]
 vivid_probe+0x45a9/0x8e70 drivers/media/test-drivers/vivid/vivid-core.c:2003
 platform_probe+0xd2/0x190 drivers/base/platform.c:1404
 call_driver_probe drivers/base/dd.c:579 [inline]
 really_probe+0x1cd/0xb30 drivers/base/dd.c:658
 __driver_probe_device+0x184/0x450 drivers/base/dd.c:800
 driver_probe_device+0x44/0x120 drivers/base/dd.c:830
 __driver_attach+0x1db/0x4a0 drivers/base/dd.c:1216
 bus_for_each_dev+0xec/0x170 drivers/base/bus.c:368
 bus_add_driver+0x299/0x570 drivers/base/bus.c:673
page last free pid 1 tgid 1 stack trace:
 create_dummy_stack mm/page_owner.c:69 [inline]
 register_dummy_stack+0x61/0xb0 mm/page_owner.c:75
 init_page_owner+0x25/0x950 mm/page_owner.c:93
 invoke_init_callbacks mm/page_ext.c:135 [inline]
 page_ext_init+0x51a/0x740 mm/page_ext.c:480
 mm_core_init+0x4e5/0x5e0 mm/mm_init.c:2788

Memory state around the buggy address:
 ffff888018aa4480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff888018aa4500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff888018aa4580: 00 00 00 00 00 00 00 00 00 00 00 05 fc fc fc fc
                                                    ^
 ffff888018aa4600: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff888018aa4680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================
Comment 9 Jiri Slaby 2024-04-12 06:15:52 UTC
(In reply to Jiri Slaby from comment #7)
>         if (c->mru > MAX_MRU || c->mtu > MAX_MTU)
>                 return -EINVAL;
> 
> c->mru is never constraint-checked AFAICS.

Scratch that, it's on another line and I misread, hopefully ;):
        if (c->mru < MIN_MTU || c->mtu < MIN_MTU)
                return -EINVAL;
        if (c->mru > MAX_MRU || c->mtu > MAX_MTU)


I somehow expected mtu to be checked on one line and mru on the other...
Comment 10 Jiri Slaby 2024-04-12 06:17:16 UTC
(In reply to j51569436 from comment #8)
> Below is the full report from syzkaller. 

BTW can you let syzkaller create a reproducer for this? I don't remember the details, it was like running syz-repro somehow.
Comment 11 Daniel Starke 2024-04-12 06:20:49 UTC
We could harden n_gsm against memory corruption for example by changing
		if (gsm->count == gsm->len) {
to
		if (gsm->count >= gsm->len) {

But this does not change the fact that our gsm->count field got corrupted.

P.S.: We would see data integrity issues all over the place if there was a logical
issue with the frame data length handling.
Comment 12 Daniel Starke 2024-04-12 06:23:00 UTC
Note that corruption of the gsm->mru or gsm->len field is also possible.
Comment 13 Jiri Slaby 2024-04-12 06:23:09 UTC
(In reply to Jiri Slaby from comment #10)
> (In reply to j51569436 from comment #8)
> > Below is the full report from syzkaller. 
> 
> BTW can you let syzkaller create a reproducer for this? I don't remember the
> details, it was like running syz-repro somehow.

Precisely:
./syz-repro -config <my.cfg> <crash-qemu-1-1455745459265726910>

where <my.cfg> is your config and <crash-qemu-1-1455745459265726910> is the crash id you should see on the web status page or in the log.
Comment 14 Jiri Slaby 2024-04-12 06:26:06 UTC
(In reply to Daniel Starke from comment #11)
> We could harden n_gsm against memory corruption for example by changing
>               if (gsm->count == gsm->len) {
> to
>               if (gsm->count >= gsm->len) {

(In reply to Daniel Starke from comment #12)
> Note that corruption of the gsm->mru or gsm->len field is also possible.

This would help to track this down:
if (gsm->count > MAX_MRU)
  pr_info("%s: %u %u %u\n", __func__, gsm->count, gsm->mru, gsm->len);

How often syzkaller hits this? Can you add this to "case GSM_DATA"?

If we had a repro, it would be super easy to debug.
Comment 15 Jiri Slaby 2024-04-12 06:46:21 UTC
(In reply to Daniel Starke from comment #12)
> Note that corruption of the gsm->mru or gsm->len field is also possible.

I couldn't find the issue in the code too. Provided the path comes from an asynchronous work:
> receive_buf drivers/tty/tty_buffer.c:444 [inline]
> flush_to_ldisc+0x221/0x6f0 drivers/tty/tty_buffer.c:494

another option is a race between cleanup_mux and flush_to_lidsc. Could gsm1_receive have switched to gsm0_receive by gsm_config() in the meantime?
Comment 16 Daniel Starke 2024-04-12 07:27:35 UTC
(In reply to Jiri Slaby from comment #15)
> another option is a race between cleanup_mux and flush_to_lidsc. Could
> gsm1_receive have switched to gsm0_receive by gsm_config() in the meantime?

A race condition looks possible here.

I recommend resetting gsm->state to GSM_SEARCH each time gsm->receive is changed.
Maybe we also need to guard against parallel execution of gsm->receive and its
change?
Comment 17 j51569436 2024-04-12 07:45:42 UTC

syzkaller succeeds in reproducing the crash, but fails to compile the final output because of the functions I put in as customizations. I am manually creating compilable POC code from the intermediate output provided by syzkaller.

I will share the important parts of the intermediate result code with you guys. 


unsigned long syz_ext_open_gsm()
{
        char* pts;
        long master_fd;
        int slave_fd;
        int arg;
        master_fd = open("/dev/ptmx", O_RDWR | O_CLOEXEC);
        grantpt(master_fd);
        unlockpt(master_fd);
        pts = ptsname(master_fd);
        printf("ptsname %s\n", pts);
        slave_fd = open(pts, O_RDWR | O_CLOEXEC);
        arg = N_GSM0710;
        ioctl(slave_fd, TIOCSETD, &arg);
        return (master_fd << 32) | slave_fd;
}

int syz_ext_write_master(long fd, char* buffer, size_t len)
{
        fd = fd >> 32;
        return write(fd, buffer, len);
}

uint64_t r[1] = {0x0};

void execute_call(int call)
{
                intptr_t res = 0;
        switch (call) {
        case 0:
                res = -1;
                NONFAILING(res = syz_ext_open_gsm());
                if (res != -1)
                                r[0] = res;
                break;
        case 1:
                NONFAILING(*(uint32_t*)0x20000140 = 1);
                NONFAILING(*(uint32_t*)0x20000144 = 0);
                NONFAILING(*(uint32_t*)0x20000148 = 0);
                NONFAILING(*(uint32_t*)0x2000014c = 0);
                NONFAILING(*(uint32_t*)0x20000150 = 0);
                NONFAILING(*(uint32_t*)0x20000154 = 0);
                NONFAILING(*(uint32_t*)0x20000158 = 0);
                NONFAILING(*(uint32_t*)0x2000015c = 0x3df);
                NONFAILING(*(uint32_t*)0x20000160 = 0xa4);
                NONFAILING(*(uint32_t*)0x20000164 = 6);
                NONFAILING(*(uint32_t*)0x20000168 = 2);
                NONFAILING(memset((void*)0x2000016c, 0, 32));
                syscall(__NR_ioctl, /*fd=*/r[0], /*cmd=*/0x404c4701, /*arg=*/0x20000140ul);
                break;
        case 2:
                NONFAILING(memcpy((void*)0x20000080, "\xfc\xbf\xa4\x11\x35\x47\xd6\xe8\x99\x85\x52\xab\xe3\x09\x8a\xf1\xa8\xa6\x57\x90\x66\xad\x62\xa4\x8d\xb7\x33\x6a\x12\xdd\xf7\x88\xbb\xbe\xd5\xce\xcc\x27\xba\x75\x93\x62\xab\x5c\xf4\x78\x98\x7e\x26\x21\x62\x88\xff\xe5\xab", 55));
                NONFAILING(syz_ext_write_master(/*fd=*/r[0], /*data=*/0x20000080, /*len=*/0x37));
                {
                int i;
                for(i = 0; i < 32; i++) {
                NONFAILING(syz_ext_write_master(/*fd=*/r[0], /*data=*/0x20000080, /*len=*/0x37));
                                }
                }
                break;
        }

}
Comment 18 j51569436 2024-04-12 07:47:27 UTC
static void* thr(void* arg)
{
        struct thread_t* th = (struct thread_t*)arg;
        for (;;) {
                event_wait(&th->ready);
                event_reset(&th->ready);
                execute_call(th->call);
                __atomic_fetch_sub(&running, 1, __ATOMIC_RELAXED);
                event_set(&th->done);
        }
        return 0;
}


static void execute_one(void)
{
        if (write(1, "executing program\n", sizeof("executing program\n") - 1)) {
        }
        int i, call, thread;
        for (call = 0; call < 3; call++) {
                for (thread = 0; thread < (int)(sizeof(threads) / sizeof(threads[0])); thread++) {
                        struct thread_t* th = &threads[thread];
                        if (!th->created) {
                                th->created = 1;
                                event_init(&th->ready);
                                event_init(&th->done);
                                event_set(&th->done);
                                thread_start(thr, th);
                        }
                        if (!event_isset(&th->done))
                                continue;
                        event_reset(&th->done);
                        th->call = call;
                        __atomic_fetch_add(&running, 1, __ATOMIC_RELAXED);
                        event_set(&th->ready);
                        if (call == 1)
                                break;
                        event_timedwait(&th->done, 50);
                        break;
                }
        }
        for (i = 0; i < 100 && __atomic_load_n(&running, __ATOMIC_RELAXED); i++)
                sleep_ms(1);
        close_fds();
}
Comment 19 j51569436 2024-04-12 07:47:54 UTC
I understand that it's hard for you to analyze this, so just take it as a guide.
Comment 20 Jiri Slaby 2024-04-12 07:51:15 UTC
(In reply to j51569436 from comment #17)
>         case 1:
>                 NONFAILING(*(uint32_t*)0x20000140 = 1);
>                 NONFAILING(*(uint32_t*)0x20000144 = 0);
>                 NONFAILING(*(uint32_t*)0x20000148 = 0);
>                 NONFAILING(*(uint32_t*)0x2000014c = 0);
>                 NONFAILING(*(uint32_t*)0x20000150 = 0);
>                 NONFAILING(*(uint32_t*)0x20000154 = 0);
>                 NONFAILING(*(uint32_t*)0x20000158 = 0);
>                 NONFAILING(*(uint32_t*)0x2000015c = 0x3df);
>                 NONFAILING(*(uint32_t*)0x20000160 = 0xa4);
>                 NONFAILING(*(uint32_t*)0x20000164 = 6);
>                 NONFAILING(*(uint32_t*)0x20000168 = 2);
>                 NONFAILING(memset((void*)0x2000016c, 0, 32));
>                 syscall(__NR_ioctl, /*fd=*/r[0], /*cmd=*/0x404c4701,
> /*arg=*/0x20000140ul);

GSMIOC_SETCONF with the above struct gsm_config fields set.

>                 break;
>         case 2:
>                 NONFAILING(memcpy((void*)0x20000080,
> "\xfc\xbf\xa4\x11\x35\x47\xd6\xe8\x99\x85\x52\xab\xe3\x09\x8a\xf1\xa8\xa6\x57
> \x90\x66\xad\x62\xa4\x8d\xb7\x33\x6a\x12\xdd\xf7\x88\xbb\xbe\xd5\xce\xcc\x27\
> xba\x75\x93\x62\xab\x5c\xf4\x78\x98\x7e\x26\x21\x62\x88\xff\xe5\xab", 55));

Simple write, so this ends up in receive.

>                 NONFAILING(syz_ext_write_master(/*fd=*/r[0],
> /*data=*/0x20000080, /*len=*/0x37));
             }

So GSMIOC_SETCONF racing with write?
Comment 21 j51569436 2024-04-12 07:58:58 UTC
Created attachment 306136 [details]
poc.c for crash
Comment 22 j51569436 2024-04-12 08:00:50 UTC
POC.C that is causing the crash. It has a lot of unnecessary parts, but it shouldn't be hard to recognize the parts that crash.
Comment 23 Jiri Slaby 2024-04-12 08:15:10 UTC
(In reply to Jiri Slaby from comment #20)
> (In reply to j51569436 from comment #17)
> >         case 1:
> >                 NONFAILING(*(uint32_t*)0x20000140 = 1);
> >                 NONFAILING(*(uint32_t*)0x20000144 = 0);

encapsulation = 0. So no change of ->receive.

> >                 NONFAILING(*(uint32_t*)0x20000148 = 0);
> >                 NONFAILING(*(uint32_t*)0x2000014c = 0);
> >                 NONFAILING(*(uint32_t*)0x20000150 = 0);
> >                 NONFAILING(*(uint32_t*)0x20000154 = 0);
> >                 NONFAILING(*(uint32_t*)0x20000158 = 0);
> >                 NONFAILING(*(uint32_t*)0x2000015c = 0x3df);

mru = 991, so is increased.

I still fail to see how count won't meet with len at some point when mru is only increased. Leaving up to Daniel to debug :). It should be much easier with poc -- thanks j51569436!
Comment 24 Daniel Starke 2024-04-12 11:48:33 UTC
What is the .config for the tested kernel?

I am unable to run poc.c on my test system due to missing features of the kernel:
write to /sys/kernel/debug/x86/nmi_longest_ns failed: No such file or directory
write to /proc/sys/kernel/hung_task_check_interval_secs failed: No such file or directory
write to /proc/sys/net/core/bpf_jit_kallsyms failed: No such file or directory
write to /proc/sys/net/core/bpf_jit_harden failed: No such file or directory
write to /proc/sys/kernel/softlockup_all_cpu_backtrace failed: No such file or directory
[   14.107467] cgroup: Unknown subsys name 'net'
[   14.137585] cgroup: Unknown subsys name 'rlimit'
[   14.138573] cgroup: Unknown subsys name 'memory'
Comment 25 j51569436 2024-04-12 14:08:50 UTC
git commit : b401b621758e46812da61fa58a67c3fd8d91de0d
Comment 26 j51569436 2024-04-12 14:09:04 UTC
Created attachment 306140 [details]
config of kernel
Comment 27 Daniel Starke 2024-04-16 09:31:58 UTC
I had poc.c running for more than 5h on 6.9.0-rc3 (tty-next) with the given .config but I am unable to reproduce the issue.
In the meanwhile, I was able to create an application which gives the same error. However, I am unsure if both are caused by the same events.
In my case I write data to the input of the n_gsm tty while switching between basic and advanced option mode to force a write past gsm->len and past gsm->buf.
If j51569436 is able to reproduce the issue with poc.c, please try to see if it still occurs after changing
		if (gsm->count == gsm->len) {
to
		if (gsm->count >= gsm->len) {
in gsm0_receive().

I will prepare a patch for my case.
Comment 28 j51569436 2024-04-18 08:22:00 UTC
I haven't analyzed the exact cause of the vulnerability, so I'm not sure if this is the right patch, but at least with this patch, the crash doesn't happen.