Created attachment 305016 [details] dmesg seems to be a regression since 6.5 release: the infamous error message from the kernel on this 32c/64t threadripper: > [ 2046.269103] perf: interrupt took too long (3141 > 3138), lowering > kernel.perf_event_max_sample_rate to 63600 > [ 2405.049567] Uhhuh. NMI received for unknown reason 2d on CPU 48. > [ 2405.049571] Dazed and confused, but trying to continue > [ 2406.902609] Uhhuh. NMI received for unknown reason 2d on CPU 33. > [ 2406.902612] Dazed and confused, but trying to continue > [ 2423.978918] Uhhuh. NMI received for unknown reason 2d on CPU 33. > [ 2423.978921] Dazed and confused, but trying to continue > [ 2429.995160] Uhhuh. NMI received for unknown reason 3d on CPU 48. > [ 2429.995163] Dazed and confused, but trying to continue > [ 2431.233575] Uhhuh. NMI received for unknown reason 3d on CPU 36. > [ 2431.233578] Dazed and confused, but trying to continue > [ 2442.382252] Uhhuh. NMI received for unknown reason 3d on CPU 48. > [ 2442.382255] Dazed and confused, but trying to continue > [ 2442.725076] Uhhuh. NMI received for unknown reason 2d on CPU 49. > [ 2442.725078] Dazed and confused, but trying to continue > [ 2442.732025] Uhhuh. NMI received for unknown reason 2d on CPU 48. > [ 2442.732027] Dazed and confused, but trying to continue > [ 2443.666671] Uhhuh. NMI received for unknown reason 2d on CPU 48. > [ 2443.666673] Dazed and confused, but trying to continue > [ 2443.756776] Uhhuh. NMI received for unknown reason 3d on CPU 39. > [ 2443.756779] Dazed and confused, but trying to continue > [ 2443.907309] Uhhuh. NMI received for unknown reason 3d on CPU 48. > [ 2443.907311] Dazed and confused, but trying to continue > [ 2444.004281] Uhhuh. NMI received for unknown reason 3d on CPU 49. > [ 2444.004283] Dazed and confused, but trying to continue > [ 2444.207944] Uhhuh. NMI received for unknown reason 2d on CPU 49. > [ 2444.207945] Dazed and confused, but trying to continue > [ 2444.517408] Uhhuh. NMI received for unknown reason 3d on CPU 49. > [ 2444.517410] Dazed and confused, but trying to continue > [ 2444.946941] Uhhuh. NMI received for unknown reason 2d on CPU 49. > [ 2444.946943] Dazed and confused, but trying to continue > [ 2445.573807] Uhhuh. NMI received for unknown reason 2d on CPU 49. > [ 2445.573809] Dazed and confused, but trying to continue > [ 2445.776108] Uhhuh. NMI received for unknown reason 2d on CPU 49. > [ 2445.776110] Dazed and confused, but trying to continue > [ 2445.969029] Uhhuh. NMI received for unknown reason 2d on CPU 49. > [ 2445.969031] Dazed and confused, but trying to continue > [ 2446.977458] Uhhuh. NMI received for unknown reason 3d on CPU 49. > [ 2446.977460] Dazed and confused, but trying to continue > [ 2447.044329] Uhhuh. NMI received for unknown reason 2d on CPU 46. > [ 2447.044331] Dazed and confused, but trying to continue > [ 2447.469269] Uhhuh. NMI received for unknown reason 2d on CPU 49. > [ 2447.469271] Dazed and confused, but trying to continue > [ 2447.866530] Uhhuh. NMI received for unknown reason 3d on CPU 48. > [ 2447.866531] Dazed and confused, but trying to continue > [ 2448.456615] Uhhuh. NMI received for unknown reason 3d on CPU 48. > [ 2448.456617] Dazed and confused, but trying to continue > [ 2448.509614] Uhhuh. NMI received for unknown reason 2d on CPU 49. > [ 2448.509616] Dazed and confused, but trying to continue > [ 2448.758005] Uhhuh. NMI received for unknown reason 3d on CPU 49. > [ 2448.758007] Dazed and confused, but trying to continue > [ 2449.093565] Uhhuh. NMI received for unknown reason 3d on CPU 48. > [ 2449.093567] Dazed and confused, but trying to continue > [ 2449.227344] Uhhuh. NMI received for unknown reason 3d on CPU 48. > [ 2449.227346] Dazed and confused, but trying to continue > [ 2449.770534] Uhhuh. NMI received for unknown reason 2d on CPU 49. > [ 2449.770535] Dazed and confused, but trying to continue > [ 2449.955594] Uhhuh. NMI received for unknown reason 3d on CPU 48. > [ 2449.955596] Dazed and confused, but trying to continue > [ 2450.077872] Uhhuh. NMI received for unknown reason 2d on CPU 48. > [ 2450.077874] Dazed and confused, but trying to continue > [ 2450.190844] Uhhuh. NMI received for unknown reason 3d on CPU 49. > [ 2450.190846] Dazed and confused, but trying to continue > [ 2450.561450] Uhhuh. NMI received for unknown reason 2d on CPU 49. > [ 2450.561452] Dazed and confused, but trying to continue > [ 2450.604498] Uhhuh. NMI received for unknown reason 3d on CPU 48. > [ 2450.604500] Dazed and confused, but trying to continue > [ 2450.814451] Uhhuh. NMI received for unknown reason 3d on CPU 48. > [ 2450.814453] Dazed and confused, but trying to continue > [ 2450.923171] Uhhuh. NMI received for unknown reason 2d on CPU 49. > [ 2450.923173] Dazed and confused, but trying to continue > [ 2451.084612] Uhhuh. NMI received for unknown reason 3d on CPU 49. > [ 2451.084614] Dazed and confused, but trying to continue > [ 2451.793342] Uhhuh. NMI received for unknown reason 3d on CPU 49. > [ 2451.793343] Dazed and confused, but trying to continue > [ 2451.793662] Uhhuh. NMI received for unknown reason 2d on CPU 48. > [ 2451.793664] Dazed and confused, but trying to continue > [ 2451.926819] Uhhuh. NMI received for unknown reason 3d on CPU 48. > [ 2451.926821] Dazed and confused, but trying to continue > [ 2452.502583] Uhhuh. NMI received for unknown reason 3d on CPU 49. > [ 2452.502585] Dazed and confused, but trying to continue > [ 2452.675633] Uhhuh. NMI received for unknown reason 2d on CPU 61. > [ 2452.675636] Dazed and confused, but trying to continue > [ 2452.974655] Uhhuh. NMI received for unknown reason 2d on CPU 48. > [ 2452.974657] Dazed and confused, but trying to continue > [ 7065.904855] elogind-daemon[2461]: New session c2 of user janpieter. according to dmesg, this happens without any special reason (I didn't even notice) some googling points at a ACPI C state problem on AMD CPUs a few years ago in 5.14 kernels, I didn't see it.
Created attachment 305017 [details] kernel 6.5.0 config
> some googling points at a ACPI C state problem on AMD CPUs a few years ago > in 5.14 kernels, I didn't see it. 6.4 kernels that is, sorry for the typo
(In reply to Janpieter Sollie from comment #0) > Created attachment 305016 [details] > dmesg > > seems to be a regression since 6.5 release: > the infamous error message from the kernel on this 32c/64t threadripper: > > [ 2046.269103] perf: interrupt took too long (3141 > 3138), lowering > > kernel.perf_event_max_sample_rate to 63600 > > [ 2405.049567] Uhhuh. NMI received for unknown reason 2d on CPU 48. > > [ 2405.049571] Dazed and confused, but trying to continue > > [ 2406.902609] Uhhuh. NMI received for unknown reason 2d on CPU 33. > > [ 2406.902612] Dazed and confused, but trying to continue > > [ 2423.978918] Uhhuh. NMI received for unknown reason 2d on CPU 33. > > [ 2423.978921] Dazed and confused, but trying to continue > > [ 2429.995160] Uhhuh. NMI received for unknown reason 3d on CPU 48. > > [ 2429.995163] Dazed and confused, but trying to continue > > [ 2431.233575] Uhhuh. NMI received for unknown reason 3d on CPU 36. > > [ 2431.233578] Dazed and confused, but trying to continue > > [ 2442.382252] Uhhuh. NMI received for unknown reason 3d on CPU 48. > > [ 2442.382255] Dazed and confused, but trying to continue > > [ 2442.725076] Uhhuh. NMI received for unknown reason 2d on CPU 49. > > [ 2442.725078] Dazed and confused, but trying to continue > > [ 2442.732025] Uhhuh. NMI received for unknown reason 2d on CPU 48. > > [ 2442.732027] Dazed and confused, but trying to continue > > [ 2443.666671] Uhhuh. NMI received for unknown reason 2d on CPU 48. > > [ 2443.666673] Dazed and confused, but trying to continue > > [ 2443.756776] Uhhuh. NMI received for unknown reason 3d on CPU 39. > > [ 2443.756779] Dazed and confused, but trying to continue > > [ 2443.907309] Uhhuh. NMI received for unknown reason 3d on CPU 48. > > [ 2443.907311] Dazed and confused, but trying to continue > > [ 2444.004281] Uhhuh. NMI received for unknown reason 3d on CPU 49. > > [ 2444.004283] Dazed and confused, but trying to continue > > [ 2444.207944] Uhhuh. NMI received for unknown reason 2d on CPU 49. > > [ 2444.207945] Dazed and confused, but trying to continue > > [ 2444.517408] Uhhuh. NMI received for unknown reason 3d on CPU 49. > > [ 2444.517410] Dazed and confused, but trying to continue > > [ 2444.946941] Uhhuh. NMI received for unknown reason 2d on CPU 49. > > [ 2444.946943] Dazed and confused, but trying to continue > > [ 2445.573807] Uhhuh. NMI received for unknown reason 2d on CPU 49. > > [ 2445.573809] Dazed and confused, but trying to continue > > [ 2445.776108] Uhhuh. NMI received for unknown reason 2d on CPU 49. > > [ 2445.776110] Dazed and confused, but trying to continue > > [ 2445.969029] Uhhuh. NMI received for unknown reason 2d on CPU 49. > > [ 2445.969031] Dazed and confused, but trying to continue > > [ 2446.977458] Uhhuh. NMI received for unknown reason 3d on CPU 49. > > [ 2446.977460] Dazed and confused, but trying to continue > > [ 2447.044329] Uhhuh. NMI received for unknown reason 2d on CPU 46. > > [ 2447.044331] Dazed and confused, but trying to continue > > [ 2447.469269] Uhhuh. NMI received for unknown reason 2d on CPU 49. > > [ 2447.469271] Dazed and confused, but trying to continue > > [ 2447.866530] Uhhuh. NMI received for unknown reason 3d on CPU 48. > > [ 2447.866531] Dazed and confused, but trying to continue > > [ 2448.456615] Uhhuh. NMI received for unknown reason 3d on CPU 48. > > [ 2448.456617] Dazed and confused, but trying to continue > > [ 2448.509614] Uhhuh. NMI received for unknown reason 2d on CPU 49. > > [ 2448.509616] Dazed and confused, but trying to continue > > [ 2448.758005] Uhhuh. NMI received for unknown reason 3d on CPU 49. > > [ 2448.758007] Dazed and confused, but trying to continue > > [ 2449.093565] Uhhuh. NMI received for unknown reason 3d on CPU 48. > > [ 2449.093567] Dazed and confused, but trying to continue > > [ 2449.227344] Uhhuh. NMI received for unknown reason 3d on CPU 48. > > [ 2449.227346] Dazed and confused, but trying to continue > > [ 2449.770534] Uhhuh. NMI received for unknown reason 2d on CPU 49. > > [ 2449.770535] Dazed and confused, but trying to continue > > [ 2449.955594] Uhhuh. NMI received for unknown reason 3d on CPU 48. > > [ 2449.955596] Dazed and confused, but trying to continue > > [ 2450.077872] Uhhuh. NMI received for unknown reason 2d on CPU 48. > > [ 2450.077874] Dazed and confused, but trying to continue > > [ 2450.190844] Uhhuh. NMI received for unknown reason 3d on CPU 49. > > [ 2450.190846] Dazed and confused, but trying to continue > > [ 2450.561450] Uhhuh. NMI received for unknown reason 2d on CPU 49. > > [ 2450.561452] Dazed and confused, but trying to continue > > [ 2450.604498] Uhhuh. NMI received for unknown reason 3d on CPU 48. > > [ 2450.604500] Dazed and confused, but trying to continue > > [ 2450.814451] Uhhuh. NMI received for unknown reason 3d on CPU 48. > > [ 2450.814453] Dazed and confused, but trying to continue > > [ 2450.923171] Uhhuh. NMI received for unknown reason 2d on CPU 49. > > [ 2450.923173] Dazed and confused, but trying to continue > > [ 2451.084612] Uhhuh. NMI received for unknown reason 3d on CPU 49. > > [ 2451.084614] Dazed and confused, but trying to continue > > [ 2451.793342] Uhhuh. NMI received for unknown reason 3d on CPU 49. > > [ 2451.793343] Dazed and confused, but trying to continue > > [ 2451.793662] Uhhuh. NMI received for unknown reason 2d on CPU 48. > > [ 2451.793664] Dazed and confused, but trying to continue > > [ 2451.926819] Uhhuh. NMI received for unknown reason 3d on CPU 48. > > [ 2451.926821] Dazed and confused, but trying to continue > > [ 2452.502583] Uhhuh. NMI received for unknown reason 3d on CPU 49. > > [ 2452.502585] Dazed and confused, but trying to continue > > [ 2452.675633] Uhhuh. NMI received for unknown reason 2d on CPU 61. > > [ 2452.675636] Dazed and confused, but trying to continue > > [ 2452.974655] Uhhuh. NMI received for unknown reason 2d on CPU 48. > > [ 2452.974657] Dazed and confused, but trying to continue > > [ 7065.904855] elogind-daemon[2461]: New session c2 of user janpieter. > > according to dmesg, this happens without any special reason (I didn't even > notice) > some googling points at a ACPI C state problem on AMD CPUs a few years ago > in 5.14 kernels, I didn't see it. Can you please do bisection (see Documentation/admin-guide/bug-bisect.rst in the kernel sources for instructions)?
Is there any way I can increase the probability of this to happen? I installed the 6.5 kernel 24h ago, so far the issue happened only once (system was up all day). I also saw no reason why it would happen. Bisecting this may be painful.
(In reply to Janpieter Sollie from comment #4) > Is there any way I can increase the probability of this to happen? > I installed the 6.5 kernel 24h ago, so far the issue happened only once > (system was up all day). I also saw no reason why it would happen. > Bisecting this may be painful. Sorry, I can't help you on that. In the mean time, what are you doing on your system that lead to this bug report?
(In reply to Bagas Sanjaya from comment #5) > (In reply to Janpieter Sollie from comment #4) > > Is there any way I can increase the probability of this to happen? > > I installed the 6.5 kernel 24h ago, so far the issue happened only once > > (system was up all day). I also saw no reason why it would happen. > > Bisecting this may be painful. > > Sorry, I can't help you on that. In the mean time, what are you doing > on your system that lead to this bug report? Honestly, I do not know: the system is mostly used as a "jump in in case desktop / laptop isn't powerful enough" device, and to test new features: provide long-term storage, take over DHCP/DNS functionality from the raspberry pi in case that one needs maintenance, provide resources for parking VMs, distcc, and so on. if not used for a longer time (especially during the night), it is powered off and / or put in S3. at the time of the NMI, it was idle and not doing anything I knew of. What I do know, is that when it wakes from S3 (I hope it's related), an interrupt is triggered nobody cares about. Linux advises me to boot with irqpoll, but I first wanted to know what the reason is for the nobody cared interrupt. Could this be related?
(In reply to Janpieter Sollie from comment #6) > (In reply to Bagas Sanjaya from comment #5) > > (In reply to Janpieter Sollie from comment #4) > > > Is there any way I can increase the probability of this to happen? > > > I installed the 6.5 kernel 24h ago, so far the issue happened only once > > > (system was up all day). I also saw no reason why it would happen. > > > Bisecting this may be painful. > > > > Sorry, I can't help you on that. In the mean time, what are you doing > > on your system that lead to this bug report? > > Honestly, I do not know: > the system is mostly used as a "jump in in case desktop / laptop isn't > powerful enough" device, and to test new features: provide long-term > storage, take over DHCP/DNS functionality from the raspberry pi in case that > one needs maintenance, provide resources for parking VMs, distcc, and so on. > if not used for a longer time (especially during the night), it is powered > off and / or put in S3. > at the time of the NMI, it was idle and not doing anything I knew of. > What I do know, is that when it wakes from S3 (I hope it's related), an > interrupt is triggered nobody cares about. Linux advises me to boot with > irqpoll, but I first wanted to know what the reason is for the nobody cared > interrupt. > Could this be related? Seems like this is pure hardware issue (see [1]). [1]: https://lore.kernel.org/all/e08e33d5-4f6d-91aa-f335-9404d16a983c@amd.com/
thank you for the link. Does this mean if I don't compile with IBS, the error won't show?
On 06/10/2023 13:36, bugzilla-daemon@kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=217857 > > --- Comment #8 from Janpieter Sollie (janpieter.sollie@edpnet.be) --- > thank you for the link. > Does this mean if I don't compile with IBS, the error won't show? > This is known hardware issue, see [1]. Maybe you have to upgrade your CPU to the unaffected ones (like Zen3). Bye! [1]: https://lore.kernel.org/all/20210317084829.GA474581@gmail.com/