Bug 118771

Summary: hv_netvsc panics when given a shared skbuff with < needed_headroom headroom and
Product: Drivers Reporter: Robert Collins (robertc)
Component: NetworkAssignee: drivers_network (drivers_network)
Status: RESOLVED INVALID    
Severity: normal CC: stephen
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.4.0 Subsystem:
Regression: No Bisected commit-id:

Description Robert Collins 2016-05-23 20:03:05 UTC
I consider this a bug because in the netdevice header:
```
 *      @needed_headroom: Extra headroom the hardware may need, but not in all
 *                        cases can this be guaranteed
```

so it seems that hv_netvsc should indeed handle this situation.

details...

Example bt:

       PANIC: "kernel BUG at /build/linux-FvcHlK/linux-4.4.0/net/core/skbuff.c:1201!"
         PID: 11022
     COMMAND: "pkt-gen"
        TASK: ffff8801f3633700  [THREAD_INFO: ffff8801f73e4000]
         CPU: 0
       STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 11022  TASK: ffff8801f3633700  CPU: 0   COMMAND: "pkt-gen"
 #0 [ffff8801f73e7168] machine_kexec at ffffffff8105befb
 #1 [ffff8801f73e71c8] crash_kexec at ffffffff8110da12
 #2 [ffff8801f73e7298] oops_end at ffffffff81031c29
 #3 [ffff8801f73e72c0] die at ffffffff810320db
 #4 [ffff8801f73e72f0] do_trap at ffffffff8102f0e1
 #5 [ffff8801f73e7340] do_error_trap at ffffffff8102f489
 #6 [ffff8801f73e7400] do_invalid_op at ffffffff8102fa40
 #7 [ffff8801f73e7410] invalid_op at ffffffff81826e0e
    [exception RIP: pskb_expand_head+579]
    RIP: ffffffff81709673  RSP: ffff8801f73e74c8  RFLAGS: 00010202
    RAX: 0000000000000002  RBX: ffff88007aba2c00  RCX: 0000000002080020
    RDX: 0000000000000fc0  RSI: 0000000000000100  RDI: ffff88007aba2c00
    RBP: ffff8801f73e7500   R8: 000000000001a000   R9: 0000000000000001
    R10: ffff8801f8130ec0  R11: 0000000000000000  R12: ffff8801f8130000
    R13: 0000000000000ec0  R14: 0000000000000000  R15: ffff88007aba2c00
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #8 [ffff8801f73e7508] netvsc_start_xmit at ffffffffc00853e0 [hv_netvsc]
 #9 [ffff8801f73e7780] generic_ndo_start_xmit at ffffffffc019fc4f [netmap]
#10 [ffff8801f73e7790] dev_hard_start_xmit at ffffffff8171d5e9
#11 [ffff8801f73e7800] sch_direct_xmit at ffffffff817420ff
#12 [ffff8801f73e7850] __qdisc_run at ffffffff817422e3
#13 [ffff8801f73e78a0] __dev_queue_xmit at ffffffff8171db8d
#14 [ffff8801f73e78f0] dev_queue_xmit at ffffffff8171de50
#15 [ffff8801f73e7900] nm_os_generic_xmit_frame at ffffffffc01a0ab3 [netmap]
#16 [ffff8801f73e7940] generic_netmap_txsync at ffffffffc0199402 [netmap]


This was triggered via netmap (which is about to have a workaround added to it) which was transmitting a shared skb without enough headroom. skb_cow_head() will call into pskb_expand_headroom if it needs to expand headroom, but when the reference count is not 1, pskb_expand_headroom panics.

I'm not sure whether a) skb_cow_head is wrong (and it should copy the skb in this situation) or b) whether hv_netvsc shoudl not be using skb_cow_head here.
Comment 1 Robert Collins 2016-05-23 22:47:36 UTC
Reproduction instructions.

git clone https://github.com/luigirizzo/netmap.git
git reset --hard 8123c744470aa4562635dd139d3895ed68d45dd7
cd LINUX
./configure --no-drivers
make
sudo insmod netmap.ko
cd ../examples
make
sudo ./pkt-gen -i eth0 -f tx -d 192.168.1.2:80 -s 192.168.137.75:1023 -n 5000 -S 00:15:5d:ba:c3:01 -D 00:15:5d:ba:c3:00

This should trigger the failure.

The underlying kernel calls to do the same manually would be something like:

/* Note, there is no headroom allocated */
skb = alloc_skb(1800, GFP_ATOMIC);
skb->dev = dev;
skb_copy_to_linear_data(skb, somedata, 1800);
atomic_inc(&skb->users);
dev_queue_xmit(skb);

Note that the patched driver in master still calls skb_cow_head to ask for enough headroom - and when there is no headroom this will call pskb_reserve_head, which with users != 1 will barf.
Comment 2 Stephen Hemminger 2017-01-24 23:17:26 UTC
Bug is invalid.
Kernel bugzilla is only for upstream kernel bugs WITH NO OUT OF TREE drivers.
Comment 3 Robert Collins 2018-01-21 09:49:23 UTC
I'm confused: is it invalid because I triggered the panic using netmap?

If so fair enough, but is there not an actual bug here in the hyperv driver which *is* in-tree? As far as I can tell it is legimitate (if illadvised) to submit an skb with 2 users linear data and insufficient headroom for the hyperv bus, and that is what will panic, not the netmap code.
Comment 4 Stephen Hemminger 2018-01-21 17:32:21 UTC
Unless you can reproduce with no out of tree drivers, it is invalid.
Comment 5 Robert Collins 2018-01-22 00:13:01 UTC
That confuses me because I would have thought that latent bugs were still bugs even if the current code doesn't actually provoke it.

In this case a driver panicing the kernel if dev->needed_headroom is not provided - because it is documented as being not guaranteed.

However its your policy not mine, so I'll leave this alone at this point.