Hi, I have random kernel panic on my archlinux server, since upgrade from > kernel 6.3.7. It's a headless server, no X11, no graphic card, and it is an intel processor (not AMD). On 6.3.7, everything was fine, since upgrade to 6.3.9, 6.4.3, 6.4.4, and now 6.4.12, I got random crash (sometime 24h, 48h, sometime a little bit more...). No log in journald/syslog, I managed to get only these three lines (with mounting /var/log/journal outside my nvme boot disk) : BUG: unable to handle page fault for address: 00000000352aa941 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page Some information : uname -a : Linux xxxxorg 6.4.12-arch1-1 #1 SMP PREEMPT_DYNAMIC Thu, 24 Aug 2023 00:38:14 +0000 x86_64 GNU/Linux dmesg : http://ix.io/4FE8 journalctl -b -1 | tail -1000 : http://ix.io/4FE9 (only last 1000 lines, too big) lspci -vvv : http://ix.io/4FEa lsmod : http://ix.io/4FEf lscpu : http://ix.io/4FEh Of course, if I rollback to 6.3.7, no more crash. Could you please help me to debug ? thanks.
There's not so many commits between 6.3.7 and 6.3.9, so it would be best for you just to perform regression testing using: https://docs.kernel.org/admin-guide/bug-bisect.html It may take quite some time but it's the best shot you've got since your issue is not widespread.
(In reply to cyayon from comment #0) > Hi, > > I have random kernel panic on my archlinux server, since upgrade from > > kernel 6.3.7. > It's a headless server, no X11, no graphic card, and it is an intel > processor (not AMD). > > On 6.3.7, everything was fine, since upgrade to 6.3.9, 6.4.3, 6.4.4, and now > 6.4.12, I got random crash (sometime 24h, 48h, sometime a little bit > more...). > > No log in journald/syslog, I managed to get only these three lines (with > mounting /var/log/journal outside my nvme boot disk) : > > BUG: unable to handle page fault for address: 00000000352aa941 > #PF: supervisor write access in kernel mode > #PF: error_code(0x0002) - not-present page > > Some information : > uname -a : Linux xxxxorg 6.4.12-arch1-1 #1 SMP PREEMPT_DYNAMIC Thu, 24 Aug > 2023 00:38:14 +0000 x86_64 GNU/Linux > dmesg : http://ix.io/4FE8 > journalctl -b -1 | tail -1000 : http://ix.io/4FE9 (only last 1000 lines, too > big) > lspci -vvv : http://ix.io/4FEa > lsmod : http://ix.io/4FEf > lscpu : http://ix.io/4FEh > > Of course, if I rollback to 6.3.7, no more crash. > > Could you please help me to debug ? > Please test latest mainline first.
Hello, 6.4.12 is not mainline ? Thanks.
6.4.15 is.
Sorry, I understand. The 6.4.12 is the last Archlinux 6.4 kernel. I have to wait 6.5.x …
(In reply to Artem S. Tashkinov from comment #4) > 6.4.15 is. Nope. I mean v6.x and v6.x-rcy (as in Linus's tree), not v6.x.y as in the stable one.
Hi, This night crash again (kernel 6.4.12). I manage to got some logs from syslog-ng (no log from journald). Here are the last logs just before the crash. http://ix.io/4FTW It seems to be related to nft. thanks.
(In reply to cyayon from comment #7) > Hi, > > This night crash again (kernel 6.4.12). I manage to got some logs from > syslog-ng (no log from journald). > > Here are the last logs just before the crash. > > http://ix.io/4FTW > > It seems to be related to nft. > > thanks. OK, then perform bisection to find the culprit commit that introduces your regression. If you don't know how to do so, see kernel documentation [1]. [1]: https://docs.kernel.org/admin-guide/bug-bisect.html
(In reply to cyayon from comment #7) > Hi, > > This night crash again (kernel 6.4.12). I manage to got some logs from > syslog-ng (no log from journald). > > Here are the last logs just before the crash. > > http://ix.io/4FTW > > It seems to be related to nft. > > thanks. Now v6.6-rc1 has been released, please test. Since you're about to compile vanilla kernel, see ArchWiki [1] for instructions. [1]: https://wiki.archlinux.org/title/Kernel/Traditional_compilation
Hi, Yesterday, I opened a ticket to netfilter (via email). Pablo N. tell me the issue coming from commit bdace3b1a51887211d3e49417a18fdbd315a313b. He also asked me to test 6.4.15 instead of 6.5.2 which is a little behind for this issue. I don't know about 6.6rc1 vs 6.4.15. I am currently testing and keep informed here. Thanks
On 11/09/2023 19:54, bugzilla-daemon@kernel.org wrote: > I don't know about 6.6rc1 vs 6.4.15. > The former is release candidate version, tagged from Linus's tree (aka mainline). It is primarily used for testing before official release is made. The latter is stable kernel with fixes backported form mainline. It is the recommended kernel to run on production. For more information, see [1]. Thanks. [1]: https://kernel.org/category/releases.html
I would like to say that I didn't know if 6.6rc1 include revert bdace3b1a51887211d3e49417a18fdbd315a313b (like 6.4.15).
On 11/09/2023 20:05, bugzilla-daemon@kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=217884 > > --- Comment #12 from cyayon@nbux.org --- > I would like to say that I didn't know if 6.6rc1 include revert > bdace3b1a51887211d3e49417a18fdbd315a313b (like 6.4.15). > It should already include 26b5a5712eb85e ("netfilter: nf_tables: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain") as the fix.
thanks
Hi, No crash since 3 days with 6.4.15. I will wait again a few days but it should be ok, many thanks ! I asked Pablo N to know if the patch / revert has been merged to 6.5.3, waiting his answer…
Oh, no... This morning crash again :(. Here is the journald log : http://ix.io/4Gvd (6.4.15)