I consider this a bug because in the netdevice header: ``` * @needed_headroom: Extra headroom the hardware may need, but not in all * cases can this be guaranteed ``` so it seems that hv_netvsc should indeed handle this situation. details... Example bt: PANIC: "kernel BUG at /build/linux-FvcHlK/linux-4.4.0/net/core/skbuff.c:1201!" PID: 11022 COMMAND: "pkt-gen" TASK: ffff8801f3633700 [THREAD_INFO: ffff8801f73e4000] CPU: 0 STATE: TASK_RUNNING (PANIC) crash> bt PID: 11022 TASK: ffff8801f3633700 CPU: 0 COMMAND: "pkt-gen" #0 [ffff8801f73e7168] machine_kexec at ffffffff8105befb #1 [ffff8801f73e71c8] crash_kexec at ffffffff8110da12 #2 [ffff8801f73e7298] oops_end at ffffffff81031c29 #3 [ffff8801f73e72c0] die at ffffffff810320db #4 [ffff8801f73e72f0] do_trap at ffffffff8102f0e1 #5 [ffff8801f73e7340] do_error_trap at ffffffff8102f489 #6 [ffff8801f73e7400] do_invalid_op at ffffffff8102fa40 #7 [ffff8801f73e7410] invalid_op at ffffffff81826e0e [exception RIP: pskb_expand_head+579] RIP: ffffffff81709673 RSP: ffff8801f73e74c8 RFLAGS: 00010202 RAX: 0000000000000002 RBX: ffff88007aba2c00 RCX: 0000000002080020 RDX: 0000000000000fc0 RSI: 0000000000000100 RDI: ffff88007aba2c00 RBP: ffff8801f73e7500 R8: 000000000001a000 R9: 0000000000000001 R10: ffff8801f8130ec0 R11: 0000000000000000 R12: ffff8801f8130000 R13: 0000000000000ec0 R14: 0000000000000000 R15: ffff88007aba2c00 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #8 [ffff8801f73e7508] netvsc_start_xmit at ffffffffc00853e0 [hv_netvsc] #9 [ffff8801f73e7780] generic_ndo_start_xmit at ffffffffc019fc4f [netmap] #10 [ffff8801f73e7790] dev_hard_start_xmit at ffffffff8171d5e9 #11 [ffff8801f73e7800] sch_direct_xmit at ffffffff817420ff #12 [ffff8801f73e7850] __qdisc_run at ffffffff817422e3 #13 [ffff8801f73e78a0] __dev_queue_xmit at ffffffff8171db8d #14 [ffff8801f73e78f0] dev_queue_xmit at ffffffff8171de50 #15 [ffff8801f73e7900] nm_os_generic_xmit_frame at ffffffffc01a0ab3 [netmap] #16 [ffff8801f73e7940] generic_netmap_txsync at ffffffffc0199402 [netmap] This was triggered via netmap (which is about to have a workaround added to it) which was transmitting a shared skb without enough headroom. skb_cow_head() will call into pskb_expand_headroom if it needs to expand headroom, but when the reference count is not 1, pskb_expand_headroom panics. I'm not sure whether a) skb_cow_head is wrong (and it should copy the skb in this situation) or b) whether hv_netvsc shoudl not be using skb_cow_head here.
Reproduction instructions. git clone https://github.com/luigirizzo/netmap.git git reset --hard 8123c744470aa4562635dd139d3895ed68d45dd7 cd LINUX ./configure --no-drivers make sudo insmod netmap.ko cd ../examples make sudo ./pkt-gen -i eth0 -f tx -d 192.168.1.2:80 -s 192.168.137.75:1023 -n 5000 -S 00:15:5d:ba:c3:01 -D 00:15:5d:ba:c3:00 This should trigger the failure. The underlying kernel calls to do the same manually would be something like: /* Note, there is no headroom allocated */ skb = alloc_skb(1800, GFP_ATOMIC); skb->dev = dev; skb_copy_to_linear_data(skb, somedata, 1800); atomic_inc(&skb->users); dev_queue_xmit(skb); Note that the patched driver in master still calls skb_cow_head to ask for enough headroom - and when there is no headroom this will call pskb_reserve_head, which with users != 1 will barf.
Bug is invalid. Kernel bugzilla is only for upstream kernel bugs WITH NO OUT OF TREE drivers.
I'm confused: is it invalid because I triggered the panic using netmap? If so fair enough, but is there not an actual bug here in the hyperv driver which *is* in-tree? As far as I can tell it is legimitate (if illadvised) to submit an skb with 2 users linear data and insufficient headroom for the hyperv bus, and that is what will panic, not the netmap code.
Unless you can reproduce with no out of tree drivers, it is invalid.
That confuses me because I would have thought that latent bugs were still bugs even if the current code doesn't actually provoke it. In this case a driver panicing the kernel if dev->needed_headroom is not provided - because it is documented as being not guaranteed. However its your policy not mine, so I'll leave this alone at this point.