I have since reverted to the working LTS kernel image offered by Arch Linux (4.19.13), but am willing to re-test / gather data additional data on a couple lower-use time periods during the week.
After updating to Linux 4.20.0 (along with a full system update otherwise) my BRIDGED network connections to some LXC containers ceased working.
Attempting to troubleshoot this issue also produced extremely odd results, which I think offhand MIGHT have caused network packets to fill up some kind of memory buffer instead of being relaid or dropped; there are some additional details at the serverfault and LXC bugs that I filed, as it was initially (and still is) unclear where the actual issue is.
At this time I am unsure if it is related to netdev (bridge, veth), cgroups, or some changed default that should now be configured in a way that is different to previous defaults.
* It is NOT related to IP forwarding, as this is a BRIDGED connection, not a routed one, and it works on older kernels without that enabled.
* physical network to bridge works (and will stay connected for a few min after later troubleshooting steps, even if ARP caches / ping flake out and stop responding)
* VETH (within LXC) can ping the the host IP on the bridge (but not the gateway, the host can before this step) if manually assigned a static address. Doing this seems to cause general instability and a timed out SSH session. This lead me to rebooting between each round of testing to ensure I had a clean slate to start with.
I went over the major settings that I did check in the other two bug reports, but I'm open to checking other values and/or performing different kinds of tests occasionally over a given week. Responses won't be immediate but I'll try to check on this frequently over the next two weeks.
Try applying this patch:
It solved it for me, what qdisc do you use?
(tc qdisc will list them - I was using fq which is why it hit me)
(In reply to Ian Kumlien from comment #1)
> Try applying this patch:
> It solved it for me, what qdisc do you use?
> (tc qdisc will list them - I was using fq which is why it hit me)
Thank you, I can confirm that applying that single line patch DOES make the difference and resolve the issue (for me); though as the published current kernel versions are still need this patch back-ported this bug shouldn't be closed.
ArchLinux had a package that made testing the a custom-kernel build easier, but it was based on 4.20.2, so I re-tested without (failed, as expected) the patch and with (appears to be working, as hoped).
Yeah, it didn't make 4.20.2 - It has been picked up and marked for -stable so hopefully it will be in 4.20.3 :)
FYI it's in the current pull set posted to Linus
Patch 15 in:
I've had a similar issie with bridged networking in QEMU (TAP networking to a bridge with enslaved host interface) and the patch mentioned above did solve my issue (where both the VM and the host lost internet connectivity - setting the host interface down, then up again brought networking back for the host). It's not yet included in 4.20.3 either, for anyone looking for this.
It's included in: 4.20.5-rc1
So, it should be in 4.20.5 final ;)
Released and confirmed working, IMHO this bug report can be closed with fixed in 4.20.5