Bug 202669
Summary: | Kernel panic in ip6_expire_frag_queue [regression] | ||
---|---|---|---|
Product: | Networking | Reporter: | Ralf (post+kernel) |
Component: | IPV6 | Assignee: | Hideaki YOSHIFUJI (yoshfuji) |
Status: | NEW --- | ||
Severity: | normal | CC: | eric.dumazet, hessu |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 4.9.144 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: | 5 kernel stack traces from 4 servers exhibiting the issue |
Description
Ralf
2019-02-24 17:28:14 UTC
Someone now also ran into this issue on Ubuntu: https://bugs.launchpad.net/ubuntu/+source/linux-signed/+bug/1824687 Created attachment 282563 [details]
5 kernel stack traces from 4 servers exhibiting the issue
I have had this crash, with the ip6_expire_frag_queue stack trace, more than 18 times since 2019-04-16 on more than 10 different servers in 8 different countries. There have been some more crashes, but from these ones the panic dump managed to go out to a remote syslog server where it's easy to grep. Crash count by kernel version; these are on both Ubuntu 14.04 trusty and 16.04 xenial:
2 crashes: 4.4.0-144-generic #170~14.04.1-Ubuntu
8 crashes: 4.4.0-145-generic #171-Ubuntu
8 crashes: 4.4.0-146-generic #172-Ubuntu
Downgrading to 4.4.0-143 now, as that build does not have the "ipv6: frags: rewrite ip6_expire_frag_queue()" change; it first appears in 4.4.0-144-generic image. I think by tomorrow it's clear whether that kernel is stable as we're now having multiple crashes per day (last crash 50 minutes ago).
These are routers running NAT & firewall & some applications, with substantial IPv6 traffic.
Interestingly the crashes only happen on bare hardware. We have a much larger number of VMs doing the same thing, most of them now running 4.4.0-146, and none of them have crashed like this. The hardware instances do have a larger number of CPU cores, the VMs only have 2 or 4.
I am also seeing crashes on 4.15.0-48-generic hwe kernel running on Ubuntu 16.04 xenial, but no stack trace to show yet.
Attaching kernel stack trace file containing several crashes on various servers (hessu-ipv6_expire_frag_queue-crashes.txt).
Someone has reported this same crash happening in 3-5 hours in 3 systems on Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=922488 > Someone has reported this same crash happening in 3-5 hours in 3 systems on
> Debian
That was also me.
We have since then upgraded a few of our systems to 4.19.28, and are not experiencing the issue any more. Seems like maybe something between 4.19.16 and 4.19.28 fixed it?
|