Bug 104161

Summary: DLNA services disappear on bridge device shortly after starting the dlna service
Product: Networking Reporter: Tobias Powalowski (t.powa)
Component: IPV4Assignee: Stephen Hemminger (stephen)
Status: RESOLVED CODE_FIX    
Severity: normal CC: andy, davem, evangelos, jlec, linus.luessing, t.powa
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.2 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: verbose debug patch for v4.2.0
Logfile with debug patch enabled
proposed patch to fix a regression introduced by the commit mentioned by Tobias

Description Tobias Powalowski 2015-09-07 13:31:54 UTC
Hi,
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=9afd85c9e4552b276e2f4cfefd622bdeeffbbf26

This commit introduces a weird behaviour on my dlna server, commit was bisected.

This commit was merged into 4.2 series, which makes my dlna/upnp services
disappear real soon after starting from discovering from other clients.
Localhost seems not to be affected but every external discovery is broken.
The dlna services run on a bridge device with IPV4. Mythtv and Minidlna both
disappear after some minutes of running.
4.2 with those 3 patches reverted makes it work as it should:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=a516993f0ac1694673412eb2d16a091eafa77d2a
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=fcba67c94abe83e0e69a65737000ccbb16a4fa03
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=9afd85c9e4552b276e2f4cfefd622bdeeffbbf26

Do you need any more information to get this really weird bug fixed?

Thanks in advance.
Comment 1 Tobias Powalowski 2015-09-09 06:26:24 UTC
Here comes the next interesting thing:
If I run
tcpdump -i br0 'ip6 proto 0'
all is working fine.

If I stop this task it stops showing the dlna services.
promiscious mode enabled doesn't make it fail.
[ 1212.138260] device br0 entered promiscuous mode
works
[ 1246.908027] device br0 left promiscuous mode
does not work

bridge mdb show
dev br0 port enp2s0 grp 224.0.1.60 temp
dev br0 port enp2s0 grp 239.255.255.250 temp

I have only archlinux environments, all external connections are affected.
My Panasonic TV is not working and my Sony Xperia Z1 compact mobile
phones are also not working.
VLC on Windows 7 shows the same. So this is a generic issue I guess.
Comment 2 Tobias Powalowski 2015-09-09 06:29:30 UTC
DLNA is not bound to any interface ebtables is not installed and
iptables don't show any active rule.
Any output of a network command which might help?
Thanks for investigation on this bug.
Comment 3 Linus Lüssing 2015-09-10 03:15:52 UTC
Created attachment 187241 [details]
verbose debug patch for v4.2.0

Here's a debug patch for the 4.2 release which should give us a little more information about where things are going wrong. Please add it and attach the output from dmesg here.
Comment 4 Linus Lüssing 2015-09-10 03:35:27 UTC
Some more information about the setup which Tobias and I gathered while chatting on IRC:

* bridge has "multicast_querier" disabled in sysfs
* There is no MLD querier on the link
* There is an IGMPv3 querier which isn't on the DLNA participants but an extra router
* IPv6 link-local ping6 works fine (but probably due to bridge snooping being deactivated as there is no MLD querier)

I tried reproducing this setup with an IGMPv3 querier, but an IPv4 multicast iperf still worked fine for me.

As the bridge multicast snooping seems to kick in for IPv4, I assume that IGMP query parsing is working fine here.

A wild guess: The skb_network_header isn't set properly by the driver Tobias is using for his enp2s0, while the one in my VM does. At least the bridge doesn't set it. Setting the interface to promiscious mode adds some code path in the kernel which resets the network header to a proper value. IGMPv3 queries might be parsed fine no matter with or without promiscious mode as the IP stack needs to parse it to generate reports and might reset the network header while doing that. (Though let's see what the debug patch will return - maybe this theory is bullshit :) )
Comment 5 Tobias Powalowski 2015-09-10 15:11:30 UTC
Created attachment 187281 [details]
Logfile with debug patch enabled
Comment 6 Linus Lüssing 2015-09-11 07:38:25 UTC
Created attachment 187311 [details]
proposed patch to fix a regression introduced by the commit mentioned by Tobias

I think I found the bug. Can you give this patch a try?

If it works for you too then I'll send it to netdev@ with a proper commit message and with your email address as "Reported-by:" and "Tested-by:".
Comment 7 Tobias Powalowski 2015-09-11 10:46:58 UTC
Patch fixes my issue.
Thank you very much :)