Bug 217544 - kernel fault on hibernation: get_zeroed_page/swsusp_write
Summary: kernel fault on hibernation: get_zeroed_page/swsusp_write
Status: NEW
Alias: None
Product: Memory Management
Classification: Unclassified
Component: Page Allocator (show other bugs)
Hardware: i386 Linux
: P3 high
Assignee: Andrew Morton
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-06-12 09:14 UTC by Elmar Stellnberger
Modified: 2023-07-19 10:27 UTC (History)
1 user (show)

See Also:
Kernel Version:
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg with backtrace at hibernation, 6.3.7-desktop-1.mga9 (63.62 KB, text/plain)
2023-06-12 09:20 UTC, Elmar Stellnberger
Details
/proc/config.gz (64.53 KB, application/gzip)
2023-06-12 09:37 UTC, Elmar Stellnberger
Details
disassembly by decode_stacktrace (3.35 KB, text/plain)
2023-06-12 14:04 UTC, Elmar Stellnberger
Details
switched on debug options before third hibernation attempt (489 bytes, text/plain)
2023-06-12 14:38 UTC, Elmar Stellnberger
Details
systemjournal with 3xhibernation (last incomplete) (799.63 KB, application/x-xz)
2023-06-12 14:49 UTC, Elmar Stellnberger
Details

Description Elmar Stellnberger 2023-06-12 09:14:28 UTC
page allocation error using kernel 6.3.7-desktop-1.mga9 #1 SMP PREEMPT_DYNAMIC, from Fr 09 Jun 2023 22:57:31, Key ID b742fa8b80420f66; see the backtrace in the dmesg
> cat /proc/cpuinfo
siblings	: 4
core id		: 1
cpu cores	: 2
...
type: regression, worked with the previous kernel, namely 6.3.6, Mo 05 Jun 2023 21:37:15, Key ID b742fa8b80420f66 before updating today
Comment 1 Elmar Stellnberger 2023-06-12 09:20:14 UTC
Created attachment 304398 [details]
dmesg with backtrace at hibernation, 6.3.7-desktop-1.mga9
Comment 2 Elmar Stellnberger 2023-06-12 09:37:17 UTC
Created attachment 304399 [details]
/proc/config.gz

,Currently I can´t compile the kernel myself, maybe as soon as Friday. Here are some config options:
CONFIG_DEBUG_KERNEL=y
CONFIG_KALLSYMS=y, CONFIG_KALLSYMS_ALL=y, CONFIG_KALLSYMS_BASE_RELATIVE=y
CONFIG_DYNAMIC_DEBUG=y
CONFIG_DEBUG_INFO_NONE=y, others *DWARF* etc. not

Shall I enable debug output for some specific source files or modules? Is there any way to capture /proc/kcore directly on an oops (CONFIG_KGDB is not set)?
Comment 3 Elmar Stellnberger 2023-06-12 14:04:00 UTC
Created attachment 304403 [details]
disassembly by decode_stacktrace

Tried scripts/extract-linux and scripts/decode_stacktrace but line numbers don´t seem present either in system.map or the binary.
Comment 4 Elmar Stellnberger 2023-06-12 14:38:12 UTC
Created attachment 304407 [details]
switched on debug options before third hibernation attempt

The first hibernation attempt resulted in the backtrace you can see in the dmesg above, my second hibernation attempt from a text console (vt03 or so) has worked without errors and the third one I tried to do from the GUI/X11 again; see the debug options I had turned on). On the third attempt something strange did happen. It seemed to write to disk as it should, the screen turned black but the power led and button still stayed alighted. Waking up by pressing the power button did not yield any effect, nor the SysRq keys (alas forgot to write 511 to >/proc/sys/kernel/sysrq). After a hard power reset it booted as if not hibernated. On the first hibernation attempt I could see lengthy and intermittent disk access. On the third attempt I had waited for some considerable time.
Comment 5 Elmar Stellnberger 2023-06-12 14:49:53 UTC
Created attachment 304408 [details]
systemjournal with 3xhibernation (last incomplete)

As it seems the drm debug flag was too verbose and the memory debug flags did not yield any effect.
Comment 6 Bagas Sanjaya 2023-06-12 23:41:52 UTC
(In reply to Elmar Stellnberger from comment #0)
> page allocation error using kernel 6.3.7-desktop-1.mga9 #1 SMP
> PREEMPT_DYNAMIC, from Fr 09 Jun 2023 22:57:31, Key ID b742fa8b80420f66; see
> the backtrace in the dmesg
> > cat /proc/cpuinfo
> siblings      : 4
> core id               : 1
> cpu cores     : 2
> ...
> type: regression, worked with the previous kernel, namely 6.3.6, Mo 05 Jun
> 2023 21:37:15, Key ID b742fa8b80420f66 before updating today

Can you try latest mainline (currently v6.4-rc6)?

Also, can you perform bisection (see Documentation/admin-guide/bug-bisect.rst)?
Comment 7 Elmar Stellnberger 2023-06-23 11:52:06 UTC
I would have given you more information on this if I had succeeded to reproduce the backtrace. However I do still have that kernel version installed, including a /proc/kcore copy, which alas has not been taken directly after the first unsuccessful hibernation attempt but a full hibernate-resume later, at the same boot). If I don´t have a stack trace, I wouldn´t know what to look for in the kcore, alas. BTW: It would be nice if anyone had a short look at https://bugs.mageia.org/show_bug.cgi?id=32044 - it looks somehow like a compiler issue to me; tell me if I err (don´t know if the Mageia guys will do)
Comment 8 Elmar Stellnberger 2023-06-23 16:03:46 UTC
Now you can access the kcore and all the files of the kernel package in use at the time of the backtrace generation as .xz at https://upload.elstel.info (use DANE if you like: elstel.org/atea/ - don't run the version published here with asan, alas, it has a few bugs).
Comment 9 Elmar Stellnberger 2023-07-19 10:27:39 UTC
The bug itself did not reproduce on my machine, but I have just discovered a similar related issue (AFAIK it concerns the page allocator) and it is described in Bug 217684. (Probably I will stay online for some 30 minutes or more trying to quit programs and then re-try the hibernation) - Kernel versions are quite similar and as far as it seems there appears to be an issue related to memory allocation that pops up every now and then.

Note You need to log in before you can comment on or make changes to this bug.