|Summary:||regression: physical to VETH (LXC) network bridge after updating to 4.20.0|
|Product:||Networking||Reporter:||Michael Evans (mjevans1983)|
|Component:||Other||Assignee:||Stephen Hemminger (stephen)|
|Severity:||blocking||CC:||Ian.kumlien, linux-bugzilla, mjevans1983|
Description Michael Evans 2019-01-11 22:58:45 UTC
I have since reverted to the working LTS kernel image offered by Arch Linux (4.19.13), but am willing to re-test / gather data additional data on a couple lower-use time periods during the week. After updating to Linux 4.20.0 (along with a full system update otherwise) my BRIDGED network connections to some LXC containers ceased working. Attempting to troubleshoot this issue also produced extremely odd results, which I think offhand MIGHT have caused network packets to fill up some kind of memory buffer instead of being relaid or dropped; there are some additional details at the serverfault and LXC bugs that I filed, as it was initially (and still is) unclear where the actual issue is. - At this time I am unsure if it is related to netdev (bridge, veth), cgroups, or some changed default that should now be configured in a way that is different to previous defaults. https://serverfault.com/questions/947848/linux-bridge-broken-after-upgrade-out-of-ideas-places-to-look-now-4-20-0-arc https://github.com/lxc/lxc/issues/2769 * It is NOT related to IP forwarding, as this is a BRIDGED connection, not a routed one, and it works on older kernels without that enabled. * physical network to bridge works (and will stay connected for a few min after later troubleshooting steps, even if ARP caches / ping flake out and stop responding) * VETH (within LXC) can ping the the host IP on the bridge (but not the gateway, the host can before this step) if manually assigned a static address. Doing this seems to cause general instability and a timed out SSH session. This lead me to rebooting between each round of testing to ensure I had a clean slate to start with. I went over the major settings that I did check in the other two bug reports, but I'm open to checking other values and/or performing different kinds of tests occasionally over a given week. Responses won't be immediate but I'll try to check on this frequently over the next two weeks.
Comment 1 Ian Kumlien 2019-01-12 10:14:22 UTC
Try applying this patch: https://marc.info/?l=linux-netdev&m=154696956604748&w=2 It solved it for me, what qdisc do you use? (tc qdisc will list them - I was using fq which is why it hit me)
Comment 2 Michael Evans 2019-01-14 00:44:53 UTC
(In reply to Ian Kumlien from comment #1) > Try applying this patch: > https://marc.info/?l=linux-netdev&m=154696956604748&w=2 > > It solved it for me, what qdisc do you use? > (tc qdisc will list them - I was using fq which is why it hit me) Thank you, I can confirm that applying that single line patch DOES make the difference and resolve the issue (for me); though as the published current kernel versions are still need this patch back-ported this bug shouldn't be closed. ArchLinux had a package that made testing the a custom-kernel build easier, but it was based on 4.20.2, so I re-tested without (failed, as expected) the patch and with (appears to be working, as hoped).
Comment 3 Ian Kumlien 2019-01-14 08:20:56 UTC
Yeah, it didn't make 4.20.2 - It has been picked up and marked for -stable so hopefully it will be in 4.20.3 :)
Comment 4 Ian Kumlien 2019-01-14 14:59:30 UTC
FYI it's in the current pull set posted to Linus Patch 15 in: https://marc.info/?l=linux-netdev&m=154741526902566&w=2
Comment 5 Dragoon Aethis 2019-01-18 21:12:35 UTC
I've had a similar issie with bridged networking in QEMU (TAP networking to a bridge with enslaved host interface) and the patch mentioned above did solve my issue (where both the VM and the host lost internet connectivity - setting the host interface down, then up again brought networking back for the host). It's not yet included in 4.20.3 either, for anyone looking for this.
Comment 6 Ian Kumlien 2019-01-25 15:13:53 UTC
It's included in: 4.20.5-rc1 So, it should be in 4.20.5 final ;)
Comment 7 Ian Kumlien 2019-01-27 23:06:04 UTC
Released and confirmed working, IMHO this bug report can be closed with fixed in 4.20.5