Bug 209407 - cdc_ncm_fill_tx_frame calls memset with huge length when called from cdc_mbim_tx_fixup
Summary: cdc_ncm_fill_tx_frame calls memset with huge length when called from cdc_mbim...
Status: RESOLVED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: USB (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: Default virtual assignee for Drivers/USB
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-09-27 15:34 UTC by Jouni Seppänen
Modified: 2021-01-22 12:20 UTC (History)
1 user (show)

See Also:
Kernel Version: 5.10.4
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Debug information including the dmesg dump (120.73 KB, text/plain)
2020-09-27 15:34 UTC, Jouni Seppänen
Details
dmesg from 5.9.13 (53.24 KB, text/plain)
2020-12-31 06:07 UTC, Jouni Seppänen
Details
log messages from 5.10.4 (5.71 KB, text/plain)
2020-12-31 12:21 UTC, Jouni Seppänen
Details
KASAN output (7.74 KB, text/plain)
2021-01-01 08:30 UTC, Jouni Seppänen
Details

Description Jouni Seppänen 2020-09-27 15:34:36 UTC
Created attachment 292681 [details]
Debug information including the dmesg dump

I'm running a PCEngines APU3C4 system as a router, with a Huawei
909s-120 LTE module for the upstream connection. It has been crashing
once every few days. I found a way to cause the crash (or at least one
case of it) quite reliably by transferring a large file from a host to
AWS S3 via the router. This happened initially on Debian's kernel
4.19.132-1 but I can reproduce it on an unmodified 5.8.12. The
following dumps are all from 5.8.12.

Some lines from the dmesg dump (the full dump in the attachment along
with other debug information):

> [ 2297.135441] skbuff: skb_over_panic: text:ffffffffc09bf27a len:122 put:122
> head:0000000000000000 data:0000000000000000 tail:0x7a end:0x0 dev:<NULL>
> [ 2297.137499] BUG: unable to handle page fault for address: ffff90b01631c000
> [ 2297.167354] Call Trace:
> [ 2297.167369]  <IRQ>
> [ 2297.167409]  cdc_ncm_fill_tx_frame+0x83a/0x970 [cdc_ncm]
> [ 2297.306125]  cdc_mbim_tx_fixup+0x1d9/0x240 [cdc_mbim]
> [ 2297.311578]  usbnet_start_xmit+0x5d/0x720 [usbnet]
> [ 2297.316682]  dev_hard_start_xmit+0x93/0x1f0

cdc_ncm_fill_tx_frame+0x83a is in cdc_ncm_fill_tx_frame (./include/linux/skbuff.h:2252):

> 2246  static inline void *skb_put_zero(struct sk_buff *skb, unsigned int len)
> 2247  {
> 2248          void *tmp = skb_put(skb, len);
> 2249
> 2250          memset(tmp, 0, len);
> 2251
> 2252          return tmp;
> 2253  }

cdc_mbim_tx_fixup+0x1d9 is in cdc_mbim_tx_fixup (drivers/net/usb/cdc_mbim.c:292):

> 288                   }
> 289           }
> 290
> 291           spin_lock_bh(&ctx->mtx);
> 292           skb_out = cdc_ncm_fill_tx_frame(dev, skb, sign);
> 293           spin_unlock_bh(&ctx->mtx);
> 294           return skb_out;


My corresponding Debian bug report is at https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=970736
Comment 1 Jouni Seppänen 2020-12-31 06:05:39 UTC
Still occurs in 5.9.13. The BUG log message is different but the call trace still shows cdc_ncm_fill_tx_frame.

[163049.133074] BUG: unable to handle page fault for address: ffff8d3e47000000
[163049.140062] #PF: supervisor write access in kernel mode
[163049.145385] #PF: error_code(0x0003) - permissions violation
[163049.151052] PGD afa01067 P4D afa01067 PUD 12efff067 PMD 11a27f063 PTE 80000000c7000161
[163049.159073] Oops: 0003 [#1] SMP NOPTI
[163049.162839] CPU: 2 PID: 0 Comm: swapper/2 Kdump: loaded Tainted: G            E     5.9.13 #1
[163049.171454] Hardware name: PC Engines apu3/apu3, BIOS v4.12.0.2 06/28/2020
[163049.178437] RIP: 0010:__memset+0x24/0x30
    ------8<------
[163049.263988] Call Trace:
[163049.266552]  <IRQ>
[163049.268682]  cdc_ncm_fill_tx_frame+0x83a/0x970 [cdc_ncm]
[163049.274124]  cdc_mbim_tx_fixup+0x1d9/0x240 [cdc_mbim]
[163049.279301]  usbnet_start_xmit+0x5d/0x720 [usbnet]
Comment 2 Jouni Seppänen 2020-12-31 06:07:57 UTC
Created attachment 294441 [details]
dmesg from 5.9.13
Comment 3 Jouni Seppänen 2020-12-31 12:20:04 UTC
Occurs in 5.10.4 too. Unfortunately the kdump tools on my system are failing to write the dmesg dump for this kernel version, but I got some log messages via a 
serial console.

[  649.351619] BUG: unable to handle page fault for address: ffff8c3302243000
[  649.358522] #PF: supervisor write access in kernel mode
[  649.363766] #PF: error_code(0x0003) - permissions violation
[  649.369346] PGD 12e801067 P4D 12e801067 PUD 12e805067 PMD 10224c063 PTE 8000000102243161
[  649.377465] Oops: 0003 [#1] SMP NOPTI
[  649.381150] CPU: 2 PID: 0 Comm: swapper/2 Kdump: loaded Not tainted 5.10.4 #1
[  649.388292] Hardware name: PC Engines apu3/apu3, BIOS v4.12.0.2 06/28/2020
[  649.395188] RIP: 0010:__memset+0x24/0x30
   --- 8< ---
[  649.460541] kernel BUG at lib/list_debug.c:53!
[  649.468906] list_del corruption. prev->next should be ffff9c260153beb8, but was 0000000000000000
[  649.468938] ------------[ cut here ]------------
[  649.468943] kernel BUG at lib/list_debug.c:53!
[  649.515815] Call Trace:
[  649.518294]  <IRQ>
[  649.520353]  cdc_ncm_fill_tx_frame+0x821/0x950 [cdc_ncm]
[  649.525722]  cdc_mbim_tx_fixup+0x1fb/0x260 [cdc_mbim]
[  649.530822]  usbnet_start_xmit+0x71/0x740 [usbnet]

The failing kdump-tools output is

[   53.583779] kdump-tools[844]: The kernel version is not supported.
[   53.601995] kdump-tools[844]: The makedumpfile operation may be incomplete.
[   53.610833] kdump-tools[844]: The dumpfile is saved to /var/crash/202012311403/dump-incomplete.
[   53.629930] kdump-tools[844]: makedumpfile Completed.
[   53.651012] kdump-tools[844]: kdump-tools: saved vmcore in /var/crash/202012311403.
[   57.611244] kdump-tools[844]: running makedumpfile --dump-dmesg /proc/vmcore /var/crash/202012311403/dmesg.202012311403.
[   57.630883] kdump-tools[844]: dump_dmesg: Can't find some symbols for log_buf.
[   57.649968] kdump-tools[844]: The kernel version is not supported.
[   57.665919] kdump-tools[844]: The makedumpfile operation may be incomplete.
[   57.681790] kdump-tools[844]: makedumpfile Failed.
[   57.704245] kdump-tools[844]: kdump-tools: makedumpfile --dump-dmesg failed. dmesg content will be unavailable ... failed!
[   57.727124] kdump-tools[844]: kdump-tools: failed to save dmesg content in /var/crash/202012311403 ... failed!
Comment 4 Jouni Seppänen 2020-12-31 12:21:09 UTC
Created attachment 294447 [details]
log messages from 5.10.4
Comment 5 Jouni Seppänen 2021-01-01 08:29:18 UTC
I compiled with KASAN and UBSAN and ran my test case:

[  712.401555] ==================================================================
[  712.409173] BUG: KASAN: slab-out-of-bounds in cdc_ncm_fill_tx_frame+0x2776/0x3b10 [cdc_ncm]
[  712.418199] Write of size 4294967294 at addr ffff888109964002 by task ksoftirqd/2/21
 --- 8< ---
[  758.368348] RIP: 0010:__memset+0x24/0x30
 --- 8< ---
[  758.457314] Call Trace:
[  758.460029]  cdc_ncm_fill_tx_frame+0x2776/0x3b10 [cdc_ncm]
[  758.466067]  cdc_mbim_tx_fixup+0x576/0x730 [cdc_mbim]
[  758.471549]  ? skb_mac_gso_segment+0x1e7/0x3d0
[  758.476427]  ? cdc_mbim_rx_kill_vid+0x50/0x50 [cdc_mbim]
[  758.482075]  ? skb_network_protocol+0x510/0x510
[  758.487046]  usbnet_start_xmit+0x13b/0x2650 [usbnet]
[  758.492510]  ? do_raw_spin_lock+0x126/0x2a0
[  758.497068]  dev_hard_start_xmit+0x15d/0x500
[  758.501604]  sch_direct_xmit+0x222/0xa60
[  758.505842]  ? dev_reset_queue.constprop.21+0x100/0x100
[  758.511509]  ? do_raw_spin_lock+0x126/0x2a0
[  758.516012]  ? rwlock_bug.part.1+0x90/0x90
[  758.520418]  __dev_queue_xmit+0x130f/0x1f80
[  758.524977]  ? netdev_core_pick_tx+0x2e0/0x2e0
[  758.529854]  ? __nft_trace_packet+0x1c0/0x1c0 [nf_tables]
[  758.535689]  ? rt_cache_valid+0xec/0x160
[  758.539967]  ? memset+0x2d/0x70
[  758.543298]  ? __xfrm_decode_session+0x179/0x2db0
[  758.548359]  ip_finish_output2+0x975/0x1b90
[  758.552884]  ? __ip_flush_pending_frames.isra.11+0x2f0/0x2f0
[  758.558967]  ? 0xffffffffc0e68000
[  758.562599]  __ip_finish_output+0x380/0x870
[  758.567141]  ? __nft_set_pktinfo_ipv6_validate+0x4b0/0x4b0 [nf_tables]
[  758.574261]  ? ip_fragment.constprop.20+0x220/0x220
[  758.579588]  ? nf_confirm+0x2b1/0x390 [nf_conntrack]
[  758.585081]  ? nf_hook_slow+0xc5/0x190
[  758.589048]  ip_output+0x1a4/0x300
[  758.592663]  ? ip_mc_output+0x7d0/0x7d0
[  758.596844]  ? __ip_finish_output+0x870/0x870
[  758.601647]  ? nf_hook_slow+0xc5/0x190
[  758.605643]  ip_forward+0xf92/0x1b90
[  758.609417]  ? nf_nat_ipv4_in+0xc7/0x200 [nf_nat]
[  758.614452]  ? ip_forward_finish+0x260/0x260
[  758.619129]  ? ip_route_input_noref+0x74/0xb0
[  758.623809]  ? ip_route_input_rcu+0xbf0/0xbf0
[  758.628500]  ? ip4_key_hashfn+0x200/0x200
[  758.632860]  ip_sublist_rcv_finish+0x112/0x1d0
[  758.637636]  ip_sublist_rcv+0x3f6/0x650
[  758.641858]  ? ip_rcv_core+0xcc0/0xcc0
[  758.645942]  ? tpacket_rcv+0x35f0/0x35f0
[  758.650260]  ? ip_rcv_finish_core.isra.3+0x1640/0x1640
[  758.655752]  ? ip_rcv_core+0x5c2/0xcc0
[  758.659704]  ? do_xdp_generic+0x20/0x20
[  758.663941]  ip_list_rcv+0x2a5/0x3e0
[  758.667888]  ? ip_rcv+0x1e0/0x1e0
[  758.671443]  __netif_receive_skb_list_core+0x660/0x8f0
[  758.677050]  ? napi_gro_receive+0x23f/0x2b0
[  758.681553]  ? process_backlog+0x640/0x640
[  758.685930]  ? kasan_save_stack+0x19/0x40
[  758.690269]  ? kasan_set_track+0x1c/0x30
[  758.694500]  ? kasan_set_free_info+0x1b/0x30
[  758.699230]  ? __kasan_slab_free+0x106/0x150
[  758.703808]  ? kmem_cache_free+0xde/0x450
[  758.708064]  ? napi_gro_receive+0x23f/0x2b0
[  758.712624]  ? igb_poll+0x114e/0x5000 [igb]
[  758.717116]  ? __do_softirq+0x1a6/0x605
[  758.721188]  ? run_ksoftirqd+0x1a/0x30
[  758.725326]  ? smpboot_thread_fn+0x28b/0x660
[  758.729862]  ? kthread+0x313/0x410
[  758.733511]  ? ret_from_fork+0x22/0x30
[  758.737461]  netif_receive_skb_list_internal+0x5d5/0xc00
[  758.743055]  ? stack_trace_save+0xb0/0xb0
[  758.747345]  ? __netif_receive_skb_list_core+0x8f0/0x8f0
[  758.753066]  ? inet_gro_receive+0x1a9/0x1270
[  758.757615]  ? gro_normal_one+0x65/0x250
[  758.761870]  ? napi_gro_complete.constprop.84+0x124/0x2e0
[  758.767709]  ? napi_gro_flush+0x253/0x3e0
[  758.772037]  napi_complete_done+0x18c/0x600
[  758.776556]  igb_poll+0x1a5d/0x5000 [igb]
[  758.780804]  ? update_rq_clock+0xc3/0x520
[  758.785111]  ? newidle_balance+0x893/0xdb0
[  758.789683]  ? igb_alloc_rx_buffers+0x870/0x870 [igb]
[  758.795041]  net_rx_action+0x333/0xcf0
[  758.799088]  ? napi_busy_loop+0x580/0x580
[  758.803372]  ? rcu_note_context_switch+0x1e0/0x1e0
[  758.808615]  ? do_raw_spin_unlock+0x54/0x270
[  758.813247]  ? _raw_spin_unlock_irqrestore+0x10/0x20
[  758.818540]  __do_softirq+0x1a6/0x605
[  758.822468]  ? perf_trace_irq_handler_entry+0x500/0x500
[  758.828099]  run_ksoftirqd+0x1a/0x30
[  758.831984]  smpboot_thread_fn+0x28b/0x660
[  758.836458]  ? smpboot_register_percpu_thread+0x1a0/0x1a0
[  758.842286]  ? do_raw_spin_unlock+0x54/0x270
[  758.846811]  ? smpboot_register_percpu_thread+0x1a0/0x1a0
[  758.852590]  kthread+0x313/0x410
[  758.856170]  ? kthread_bind+0x70/0x70
[  758.860063]  ret_from_fork+0x22/0x30
[  758.863959] Modules linked in: xt_nat xt_tcpudp nft_compat veth nft_counter xt_conntrack bridge nf_conntrack_netlink xfrm_user xfrm_algo overlay cdc_mbim cdc_wdm cdc_ncm 8021q garp stp mrp llc cdc_ether usbnet option usb_wwan mii usbserial amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel glue_helper libaes crypto_simd cryptd sg pcspkr k10temp fam15h_power sp5100_tco ccp watchdog evdev rng_core button acpi_cpufreq nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct sch_fq_codel nft_masq nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 dm_mod raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod sd_mod t10_pi crc_t10dif uas crct10dif_generic usb_storage sdhci_pci crct10dif_pclmul crct10dif_common crc32c_intel cqhci ahci xhci_pci libahci sdhci libata ehci_pci
[  758.865135]  xhci_hcd ehci_hcd mmc_core igb i2c_piix4 scsi_mod usbcore ptp pps_core i2c_algo_bit dca gpio_keys
[  758.969517] CR2: ffff888104c9a000
Comment 6 Jouni Seppänen 2021-01-01 08:30:22 UTC
Created attachment 294463 [details]
KASAN output
Comment 7 Jouni Seppänen 2021-01-01 09:52:30 UTC
It looks like the code is calling memset with 0xFFFFFFFE, i.e. negative 2. I added a check in the "If collected data size is less or equal ctx->min_tx_pkt" block:

	if (!(dev->driver_info->flags & FLAG_SEND_ZLP) &&
	    skb_out->len > ctx->min_tx_pkt) {
                if (ctx->tx_curr_size < skb_out->len) {
                        pr_err("ctx->tx_curr_size = %x < skb_out->len = %x", ctx->tx_curr_size, skb_out->len);
                        BUG();
                }
		padding_count = ctx->tx_curr_size - skb_out->len;

And after rerunning my test case, I see:

[  683.677876] ctx->tx_curr_size = 4000 < skb_out->len = 4002
[  683.677953] ------------[ cut here ]------------
[  683.683590] kernel BUG at drivers/net/usb/cdc_ncm.c:1414!

So ctx->tx_curr_size is 0x4000 but skb_out->len is two larger.
Comment 8 Jouni Seppänen 2021-01-01 16:25:06 UTC
I can't claim to understand what all the code in these files does, but I noticed a couple of things:

The functions cdc_ncm_ndp16 and cdc_ncm_ndp32 include this check:

	/* verify that there is room for the NDP and the datagram (reserve) */
	if ((ctx->tx_curr_size - skb->len - reserve) < ctx->max_ndp_size)
		return NULL;

I think the types on the left-hand side of the comparison are as follows:

struct cdc_ncm_ctx {
        ...
	u32 tx_curr_size;
        ...
}

struct sk_buff {
        ...
	unsigned int		len,
				data_len;
        ...
}

static struct usb_cdc_ncm_ndp16 *cdc_ncm_ndp16(..., size_t reserve)

Each type is unsigned, so if skb->len + reserve ever happens to be greater than ctx->tx_curr_size, the subtraction would result in a large number and the comparison would almost certainly yield false.

Another thing: there seems to be a set of constants referring to 16/32 bit quantities, and quantities of the same size seem to go together. For example, when cdc_ncm_tx_fixup calls cdc_ncm_fill_tx_frame, it chooses the sign argument like so:

	if (ctx->is_ndp16)
		skb_out = cdc_ncm_fill_tx_frame(dev, skb, cpu_to_le32(USB_CDC_NCM_NDP16_NOCRC_SIGN));
	else
		skb_out = cdc_ncm_fill_tx_frame(dev, skb, cpu_to_le32(USB_CDC_NCM_NDP32_NOCRC_SIGN));

But when cdc_mbim_tx_fixup calls cdc_ncm_fill_tx_frame, the sign argument is either cpu_to_le32(USB_CDC_MBIM_NDP16_IPS_SIGN) or cpu_to_le32(USB_CDC_MBIM_NDP16_DSS_SIGN). The constants USB_CDC_MBIM_NDP32_IPS_SIGN and USB_CDC_MBIM_NDP32_DSS_SIGN are defined in include/uapi/linux/usb/cdc.h but neither seems to be used anywhere in the kernel.
Comment 9 Jouni Seppänen 2021-01-03 15:02:18 UTC
I believe I figured out what is happening. Patch in https://patchwork.kernel.org/project/linux-usb/patch/20210103143602.95343-1-jks@iki.fi/
Comment 10 Jouni Seppänen 2021-01-22 12:20:27 UTC
Fixed in 5.11-rc3, 5.10.8, 5.4.90, 4.19.168, 4.14.216.

Note You need to log in before you can comment on or make changes to this bug.