Bug 218502

Summary: [hv_netvsc] 6.6 regression: `options hv_netvsc ring_size=512` causes "unable to open channel: -12"
Product: Drivers Reporter: nanericwang
Component: NetworkAssignee: drivers_network (drivers_network)
Status: NEW ---    
Severity: high CC: decui, haiyangz, hjanssen, mhkelley, schakrabarti, stephen
Priority: P3    
Hardware: All   
OS: Linux   
Kernel Version: 6.6.16 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: complete dmesg log
kernel config

Description nanericwang 2024-02-17 06:36:04 UTC
Created attachment 305885 [details]
complete dmesg log

Upgraded from 6.1 LTS to 6.6 LTS, the hyper-v VM was unable to probe NIC. The dmesg gave me this:

```
[  +0.000282] hv_vmbus: registering driver hv_netvsc
[  +0.000109] ------------[ cut here ]------------
[  +0.000002] WARNING: CPU: 1 PID: 184 at mm/page_alloc.c:4402 __alloc_pages+0x341/0x350
[  +0.000008] Modules linked in: pcspkr(+) hv_netvsc(+) hyperv_drm hv_balloon hv_utils tcp_bbr sch_fq_pie sch_pie nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 crc32c_intel loop fuse dm_mod nfnetlink bpf_preload ip_tables x_tables atkbd(E) btrfs(E) libcrc32c(E) crc32c_generic(E) raid6_pq(E) xor(E) vivaldi_fmap(E) libps2(E) serio_raw(E) hyperv_keyboard(E) hv_storvsc(E) serio(E) scsi_transport_fc(E) hv_vmbus(E)
[  +0.000027] CPU: 1 PID: 184 Comm: (udev-worker) Tainted: G            E      6.6.16-1-lts #1 b08410a9aad03006712b58b4af2412bf48a1a173
[  +0.000003] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 07/12/2023
[  +0.000002] RIP: 0010:__alloc_pages+0x341/0x350
[  +0.000004] Code: 89 ee 89 df c6 44 24 20 00 4c 89 64 24 08 41 89 de e8 13 ef ff ff 49 89 c5 e9 7a fe ff ff 80 e3 3f eb c0 c6 05 5c a2 c4 01 01 <0f> 0b eb 98 e8 46 6e a2 00 66 0f 1f 44 00 00 90 90 90 90 90 90 90
[  +0.000002] RSP: 0000:ffa000000105f970 EFLAGS: 00010246
[  +0.000003] RAX: 0000000000000000 RBX: 0000000000000dc0 RCX: 0000000000000000
[  +0.000002] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000dc0
[  +0.000001] RBP: 000000000000000b R08: 0000000000000000 R09: ffffffffc05875c0
[  +0.000002] R10: ff11000100c10a00 R11: 0000000000000000 R12: 0000000000000000
[  +0.000001] R13: 000000000000000b R14: ff1100010116bc60 R15: ff1100010b980000
[  +0.000002] FS:  0000718badb94500(0000) GS:ff110001bbd00000(0000) knlGS:0000000000000000
[  +0.000002] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  +0.000002] CR2: 0000729d2217a000 CR3: 00000001088a0001 CR4: 0000000000371ee0
[  +0.000005] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  +0.000001] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  +0.000002] Call Trace:
[  +0.000004]  <TASK>
[  +0.000001]  ? __alloc_pages+0x341/0x350
[  +0.000003]  ? __warn+0x81/0x130
[  +0.000006]  ? __alloc_pages+0x341/0x350
[  +0.000003]  ? report_bug+0x171/0x1a0
[  +0.000008]  ? handle_bug+0x3c/0x80
[  +0.000004]  ? exc_invalid_op+0x17/0x70
[  +0.000003]  ? asm_exc_invalid_op+0x1a/0x20
[  +0.000005]  ? __pfx_netvsc_channel_cb+0x10/0x10 [hv_netvsc 8c513a58a14c16b33afe10e9604949ff35d0e775]
[  +0.000013]  ? __alloc_pages+0x341/0x350
[  +0.000004]  vmbus_alloc_ring+0x74/0xd0 [hv_vmbus 0000000000000000000000000000000000000000]
[  +0.000010]  ? __pfx_netvsc_channel_cb+0x10/0x10 [hv_netvsc 8c513a58a14c16b33afe10e9604949ff35d0e775]
[  +0.000009]  ? vmbus_open+0x24/0x70 [hv_vmbus 0000000000000000000000000000000000000000]
[  +0.000008]  ? netvsc_device_add+0x285/0xc90 [hv_netvsc 8c513a58a14c16b33afe10e9604949ff35d0e775]
[  +0.000010]  ? rndis_filter_device_add+0xd0/0x570 [hv_netvsc 8c513a58a14c16b33afe10e9604949ff35d0e775]
[  +0.000010]  ? netvsc_probe+0x293/0x4b0 [hv_netvsc 8c513a58a14c16b33afe10e9604949ff35d0e775]
[  +0.000010]  ? vmbus_driver_unregister+0x9e8/0xa90 [hv_vmbus 0000000000000000000000000000000000000000]
[  +0.000008]  ? really_probe+0x19b/0x3e0
[  +0.000004]  ? __pfx___driver_attach+0x10/0x10
[  +0.000002]  ? __driver_probe_device+0x78/0x160
[  +0.000003]  ? driver_probe_device+0x1f/0x90
[  +0.000002]  ? __driver_attach+0xd2/0x1c0
[  +0.000002]  ? bus_for_each_dev+0x85/0xd0
[  +0.000005]  ? bus_add_driver+0x116/0x220
[  +0.000003]  ? driver_register+0x59/0x100
[  +0.000003]  ? __pfx_netvsc_drv_init+0x10/0x10 [hv_netvsc 8c513a58a14c16b33afe10e9604949ff35d0e775]
[  +0.000010]  ? netvsc_drv_init+0x49/0xff0 [hv_netvsc 8c513a58a14c16b33afe10e9604949ff35d0e775]
[  +0.000009]  ? __pfx_netvsc_drv_init+0x10/0x10 [hv_netvsc 8c513a58a14c16b33afe10e9604949ff35d0e775]
[  +0.000009]  ? do_one_initcall+0x5a/0x320
[  +0.000004]  ? do_init_module+0x60/0x240
[  +0.000005]  ? init_module_from_file+0x89/0xe0
[  +0.000003]  ? idempotent_init_module+0x120/0x2b0
[  +0.000004]  ? __x64_sys_finit_module+0x5e/0xb0
[  +0.000003]  ? do_syscall_64+0x5d/0x90
[  +0.000003]  ? ksys_lseek+0x69/0xb0
[  +0.000005]  ? syscall_exit_to_user_mode+0x2b/0x40
[  +0.000004]  ? do_syscall_64+0x6c/0x90
[  +0.000002]  ? do_syscall_64+0x6c/0x90
[  +0.000002]  ? do_user_addr_fault+0x30f/0x660
[  +0.000006]  ? exc_page_fault+0x7f/0x180
[  +0.000002]  ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[  +0.000007]  </TASK>
[  +0.000001] ---[ end trace 0000000000000000 ]---
[  +0.000003] hv_netvsc 000d3a82-8943-000d-3a82-8943000d3a82 (unnamed net_device) (uninitialized): unable to open channel: -12
[  +0.000213] hv_vmbus: registering driver hid_hyperv
[  +0.002522] hv_netvsc 000d3a82-8943-000d-3a82-8943000d3a82 (unnamed net_device) (uninitialized): unable to add netvsc device (ret -12)
[  +0.000009] hv_vmbus: probe failed for device 000d3a82-8943-000d-3a82-8943000d3a82 (-12)
[  +0.000002] hv_netvsc: probe of 000d3a82-8943-000d-3a82-8943000d3a82 failed with error -12
[  +0.000180] input: Microsoft Vmbus HID-compliant Mouse as /devices/0006:045E:0621.0001/input/input2
[  +0.000968] hid-generic 0006:045E:0621.0001: input: VIRTUAL HID v0.01 Mouse [Microsoft Vmbus HID-compliant Mouse] on 
[  +0.002847] [drm] Initialized hyperv_drm 1.0.0 2020 for 5620e0c7-8062-4dce-aeb7-520c7ef76171 on minor 1
[  +0.000659] fbcon: Deferring console take-over
[  +0.000002] hyperv_drm 5620e0c7-8062-4dce-aeb7-520c7ef76171: [drm] fb0: hyperv_drmdrmfb frame buffer device
```
Comment 1 nanericwang 2024-02-17 06:37:20 UTC
Created attachment 305886 [details]
kernel config
Comment 2 nanericwang 2024-02-17 07:15:09 UTC
just found the repro steps:

1. echo "options hv_netvsc ring_size=512" > /etc/modprobe.d/hv_netvsc.conf
2. reboot with kernel 6.6.x
Comment 3 Stephen Hemminger 2024-02-17 16:57:31 UTC
FYI -12 is errno, ie. ENOMEM
This setting will cause driver to allocate too much memory.
There is a bug here in the unwind of vmbus_driver_unregister for the case where allocation failed.
Comment 4 Souradeep Chakrabarti 2024-02-26 13:46:43 UTC
I tried to repro, but it did not reproduce.

schakrabarti ~
> sudo dmesg | grep netvsc
[    3.768983] hv_vmbus: registering driver hv_netvsc
[    3.885753] hv_netvsc 002248b3-d501-0022-48b3-d501002248b3 eth0: VF slot 1 added
[    6.180652] hv_netvsc 002248b3-d501-0022-48b3-d501002248b3 eth0: VF registering: eth1
[    6.665438] hv_netvsc 002248b3-d501-0022-48b3-d501002248b3 eth0: Data path switched to VF: enP16555s1
[    6.786907] hv_netvsc 002248b3-d501-0022-48b3-d501002248b3 eth0: Data path switched from VF: enP16555s1
[    7.187315] hv_netvsc 002248b3-d501-0022-48b3-d501002248b3 eth0: Data path switched to VF: enP16555s1
schakrabarti ~
> uname -a
Linux schakrabarti-new-ubuntu 6.6.0 #1 SMP Mon Feb 26 11:25:24 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
schakrabarti ~
> cat /etc/modprobe.d/hv_netvsc.conf
options hv_netvsc ring_size=512
schakrabarti ~

Can you please share the Azure SKU details?
I have tried with Standard D16ds v4 (16 vcpus, 64 GiB memory). Tried with the config mentioned here. But it is not reproducing.
Comment 5 Dexuan Cui 2024-02-26 20:38:18 UTC
This is caused by 
6941f67ad37d ("hv_netvsc: Calculate correct ring size when PAGE_SIZE is not 4 Kbytes")

Sauradeep, to repro the bug, we need a kernel that contains the commit above.
Comment 6 Dexuan Cui 2024-02-27 05:57:34 UTC
Michael posted a fix here:
https://lwn.net/ml/linux-kernel/20240213061959.782110-1-mhklinux@outlook.com/
Comment 7 Souradeep Chakrabarti 2024-02-27 06:16:30 UTC
(In reply to Dexuan Cui from comment #5)
> This is caused by 
> 6941f67ad37d ("hv_netvsc: Calculate correct ring size when PAGE_SIZE is not
> 4 Kbytes")
> 
> Sauradeep, to repro the bug, we need a kernel that contains the commit above.

I will repro with the above and will test with the fix shared by Michael.
Comment 8 Michael Kelley 2024-02-27 06:45:00 UTC
Just curious -- what's the scenario where setting hv_netvsc.ring_size to 512 is needed? That's 512 pages, so 2 Mbytes for the ring in each direction per channel, instead of the 512 Kbyte default.

Sorry for causing a problem :-(
Comment 9 Souradeep Chakrabarti 2024-02-27 21:49:55 UTC
(In reply to Dexuan Cui from comment #6)
> Michael posted a fix here:
> https://lwn.net/ml/linux-kernel/20240213061959.782110-1-mhklinux@outlook.com/

I have tested the fix and it has fixed the issue. I have updated the patch mail thread.
Comment 10 Dexuan Cui 2024-03-01 16:29:08 UTC
The fix is in Hyper-V tree now:
https://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux.git/commit/?h=hyperv-fixes&id=b8209544296edbd1af186e2ea9c648642c37b18c

It would be merged into the mainline soon (this probably needs a few days/weeks).