Bug 104771
Summary: | System hangs on resume from hibernate. Black screen and hardware lockup upon resume. Hardware: Lenovo T450s | ||
---|---|---|---|
Product: | Power Management | Reporter: | Matt Schepers (mattschepers) |
Component: | Hibernation/Suspend | Assignee: | Chen Yu (yu.c.chen) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | aaron.lu, christoph, d.r.vanrossum, ddurdle, garkein, hannes.fuchs, heiko+kernel, hemant, iamtiancaif, ishitatsuyuki, j.keil, mail, matt, mattschepers, ptah.peteh, rafaeln.dev, rui.zhang, stefan.hoelldampf, tbl0605, tom, yu.c.chen |
Priority: | P1 | ||
Hardware: | Intel | ||
OS: | Linux | ||
Kernel Version: | 4.1.6 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
duplicated again on 9-23 with 4.1.7
Patch to workaround panic during resume from hibernation serial output from debug kernel - need advice serial output from debug kernel - NO no_console_suspend parameter log with call trace hibernate error failed resume output Force set up the temporary page table for non-boot cpus before restoring the page frames disable BUG_ON when resuming in IRQ rcu grace period Screenshot of general protection fault Test alloc app Kernel 4.5.4 config (Lenovo G710) chromium damage hibernation |
Description
Matt Schepers
2015-09-18 18:13:37 UTC
Are there logs when problem occured? Hi Aaron, Appologies for the late reply - I've been out of town. I was able to reproduce the problem again by hibernating and resuming repeatedly. I've attached the journalctl -b -1 output here. If there is a better log let me know. Looking at the journalctl output it was originally unclear to me where the problem was too, so I wrapped my system's hibernation command in a shell script that would also output to the log when the script was called. --------- #!/bin/bash logger 'CUSTOM: Starting hibernate!!!' systemctl hibernate --------- As you can see from the journalctl output the hibernate script called several times and the system hibernates and resumes successfully each time. The last time the script is called the kernel doesn't seem to pass information to the log about freezing processes, shutting down hardware etc. However it still powers off the machine as if it were really hibernating. Upon resume the kernel tries to read an image from disk after the encryption password is given, but fails and leaves me with locked up hardware and a black screen. I'm assuming there are no logs from the resume process because of the encryption, but I don't know. Does that better describe the issue? I just updated to the 4.1.7 mainline kernel and it has the same issue. There is someone on the arch user forums that appears to have the same issue with the same machine here: https://bbs.archlinux.org/viewtopic.php?id=201990 His logs seem to show the same kind of failure that mine do. Thank you for getting back to me! -Matt Created attachment 188221 [details]
duplicated again on 9-23 with 4.1.7
Here is the journalctl output I reference in my second post!
(In reply to Matt Schepers from comment #2) > Does that better describe the issue? I just updated to the 4.1.7 > mainline kernel and it has the same issue. Yes, that's clear now. So the log we want to see does not exist :-( And about encryption, what is it? Can you please elaborate this? Thanks. (In reply to Aaron Lu from comment #4) > (In reply to Matt Schepers from comment #2) > > Does that better describe the issue? I just updated to the 4.1.7 > > mainline kernel and it has the same issue. > > Yes, that's clear now. So the log we want to see does not exist :-( Is there a way to live-debug the kernel and get its output(I could do it if you show me the docs)? How about a debug dump to /boot? > And about encryption, what is it? Can you please elaborate this? Thanks. FDE using LVM and LUKS. This is the setup I've got: https://wiki.archlinux.org/index.php/Dm-crypt/Encrypting_an_entire_system#LVM_on_LUKS Not familiar with encryption, I guess kernel has no knowledge of the encryption? In other words, is the encryption entirely supported from user space? I doubt the resume failure have something to do with the encryption, possible to test without encryption setup? I am the other user who have reported this problem on Arch. forums. Aaron - what logs exactly you are looking for? I have pasted output of journalctl here - https://bbs.archlinux.org/viewtopic.php?id=201990 but obviously you have seen that. Is there a document somewhere that describes how to create useful logs for debugging hibernate? FWIW this problem indeed has nothing to do with encryption. I am not using encryption and I see this problem often (not always though). Whenever hibernate/resume fails - the hibernate image is not written to disk properly and "PM: Syncing filesystems ... done." is missing from logs. Thanks again for looking into it. Aaron and Rafael, is there a way to redirect the kernel's debug/log output to a file on /boot or a USB drive? I would be happy to reproduce this again to generate that output. (In reply to Hemant Kumar from comment #7) > I am the other user who have reported this problem on Arch. forums. > > Aaron - what logs exactly you are looking for? I have pasted output of Logs that have error messages. But obviously in your cases, when the error occurs, the log will not be properly written to disk. Serial log is helpful, but that isn't possible for laptops. > journalctl here - https://bbs.archlinux.org/viewtopic.php?id=201990 but > obviously you have seen that. Is there a document somewhere that describes I didn't, but I assume that is sort of the same as shown in comment #1, which says the hibernate and resume is successful for several times and unfortunately successful logs are not helpful for debugging. > how to create useful logs for debugging hibernate? > > FWIW this problem indeed has nothing to do with encryption. I am not using > encryption and I see this problem often (not always though). Whenever Good to know this, thanks. One thing to make sure: is your lapop the same as Matt's? I need to make sure we are dealing with the same problem in this bug. > hibernate/resume fails - the hibernate image is not written to disk properly > and "PM: Syncing filesystems ... done." is missing from logs. Do I understand this problem correctly that: 1 It happens sometimes, but not usually the first time; 2 It happens at hibernation phase, instead of resume phase; 3 This occurs with a swap file, instead of a swap partition; 4 This occurs with or without using encryption of the disk partition. Thanks. > 3 This occurs with a swap file, instead of a swap partition;
I am using swap partition (of size more than double the amount of RAM I have). Here is my kernel boot parameter looks like:
root=UUID=4a63e4a9-d35a-4c0b-8a08-c80a418e98bb rw resume=UUID=be708e5c-2fc7-41f4-ab28-8b2ac565516e i915.enable_ips=0
Also, I am using Thinkpad T450s, I am not sure which system Matt is using but looks like a Thinkpad. In response to Aaron's comment 9: > Good to know this, thanks. > One thing to make sure: is your lapop the same as Matt's? I need to make sure > > we are dealing with the same problem in this bug. It is the same machine. I have a lenovo t450s as well. > Do I understand this problem correctly that: > 1 It happens sometimes, but not usually the first time; Correct. > 2 It happens at hibernation phase, instead of resume phase; I believe this to be true, but am not qualified to say for sure. The logs do not show information from the kernel showing successful freezing of processes and hardware WHEN it occurs. > 3 This occurs with a swap file, instead of a swap partition; I have a swap partition. This occurs for me with the swap partition implementation. I have never tried a swap file, the way I named the thread was ambiguous. Sorry! > 4 This occurs with or without using encryption of the disk partition. It appears so. He has no encryption, but I do. (In reply to Hemant Kumar from comment #11) > Also, I am using Thinkpad T450s, I am not sure which system Matt is using > but looks like a Thinkpad. Correct. Add Yu. Yu, I remembered you have some patches related to hibernation, can you please post them here or show us the link to those patches so that Matt and Hemant could give them a try? Thanks. Created attachment 188341 [details]
Patch to workaround panic during resume from hibernation
This patch is to work around the panic during resume from hibernation.
I think we can boot system with init=/bin/bash, and compile disk driver as build in (CONFIG_SATA_AHCI=y for example), then after first booting into the system,
try:
swapon /dev/sda3 (your actual disk partition, not uuid please)
the echo disk > /sys/power/state,
then append resume=/dev/sda3 in cmdline in second kernel (not uuid please)
and do some stress testings ?
or enable the dynamic debug by append:
dyndbg='file kernel/power/hibernate.c +p;file kernel/power/snapshot.c +p'
Thanks
Yu
Created attachment 188481 [details] serial output from debug kernel - need advice (In reply to Aaron Lu from comment #9) > Logs that have error messages. > But obviously in your cases, when the error occurs, the log will not be > properly written to disk. > Serial log is helpful, but that isn't possible for laptops. > Aaron, I was able to build a debug kernel with usb-to-serial debugging enabled. I was able to get some output to the serial console by appending "debug ignore_loglevel no_console_suspend=1 console=tty console=ttyUSB0,115200n8" to the GRUB_CMDLINE_LINUX in grub. I was not able to get the system to successfully hibernate and resume even once in this configuration. It appears that the serial output really slows everything down. The output from that test run is attached here, even though it doesn't show the issue we're talking about. You will have to tell me what options to enable to get the output you want on the serial console. Please advise. I will try Chen's patch next. Created attachment 188491 [details]
serial output from debug kernel - NO no_console_suspend parameter
This time I removed the no_console_suspend parameter from the boot commands and I was able to get it to hibernate and resume successfully a few times with the serial console working. After about 6 cycles I was able to reproduce the failure, but the serial log doesn't show anything.
Created attachment 188631 [details]
log with call trace
I ran a bash script that repeated the built in test interface many times:
--------------
#!/bin/bash
for i in {1..100}
do
echo core > /sys/power/pm_test
echo shutdown > /sys/power/disk
echo disk > /sys/power/state
sleep 5
done
--------------
After a few runs I stopped the script and checked my logs. A warning and a call trace came up. Does this mean anything?
Full log is in the attachment.
Sep 26 23:04:56 habanero kernel: PM: Preallocating image memory...
Sep 26 23:04:56 habanero kernel: ------------[ cut here ]------------
Sep 26 23:04:56 habanero kernel: WARNING: CPU: 1 PID: 579 at mm/page_alloc.c:2421 __alloc_pages_nodemask+0x9e9/0xa00()
Sep 26 23:04:56 habanero kernel: Modules linked in: bnep bluetooth fuse ccm xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun bridge ebtable_filter ebtables ip6_tables vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) xt_conntrack vboxdrv(OE) iptable_raw iptable_security iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack arc4 uvcvideo videobuf2_vmalloc videobuf2_core videobuf2_memops v4l2_common iTCO_wdt videodev iTCO_vendor_support media iwlmvm snd_hda_codec_hdmi intel_rapl iosf_mbi snd_hda_codec_realtek mac80211 x86_pkg_temp_thermal snd_hda_codec_generic coretemp kvm_intel snd_hda_intel kvm snd_hda_controller snd_hda_codec iwlwifi snd_hda_core snd_hwdep snd_seq snd_seq_device rtsx_pci_ms cfg80211 joydev snd_pcm thinkpad_acpi lpc_ich memstick i2c_i801 snd_timer wmi mei_me snd rfkill
Sep 26 23:04:56 habanero kernel: mei shpchp soundcore nfsd auth_rpcgss nfs_acl lockd grace sunrpc dm_crypt 8021q garp stp llc mrp hid_logitech_hidpp hid_logitech_dj i915 rtsx_pci_sdmmc mmc_core crct10dif_pclmul crc32_pclmul crc32c_intel i2c_algo_bit drm_kms_helper drm ghash_clmulni_intel e1000e serio_raw rtsx_pci mfd_core ptp pps_core video
Sep 26 23:04:56 habanero kernel: CPU: 1 PID: 579 Comm: jbd2/dm-2-8 Tainted: G OE 4.1.7-200.fc22.x86_64 #1
Sep 26 23:04:56 habanero kernel: Hardware name: LENOVO 20BXCTO1WW/20BXCTO1WW, BIOS JBET51WW (1.16 ) 07/08/2015
Sep 26 23:04:56 habanero kernel: 0000000000000000 00000000d582e0f7 ffff8803488ff958 ffffffff8179972d
Sep 26 23:04:56 habanero kernel: 0000000000000000 0000000000000000 ffff8803488ff998 ffffffff810a165a
Sep 26 23:04:56 habanero kernel: 0002085800000001 0000000000020858 0000000000000800 0000000000000000
Sep 26 23:04:56 habanero kernel: Call Trace:
Sep 26 23:04:56 habanero kernel: [<ffffffff8179972d>] dump_stack+0x45/0x57
Sep 26 23:04:56 habanero kernel: [<ffffffff810a165a>] warn_slowpath_common+0x8a/0xc0
Sep 26 23:04:56 habanero kernel: [<ffffffff810a178a>] warn_slowpath_null+0x1a/0x20
Sep 26 23:04:56 habanero kernel: [<ffffffff811b7319>] __alloc_pages_nodemask+0x9e9/0xa00
Sep 26 23:04:56 habanero kernel: [<ffffffff81200521>] alloc_pages_current+0x91/0x110
Sep 26 23:04:56 habanero kernel: [<ffffffff811acbef>] __page_cache_alloc+0xaf/0xd0
Sep 26 23:04:56 habanero kernel: [<ffffffff811acd84>] pagecache_get_page+0x84/0x1e0
Sep 26 23:04:56 habanero kernel: [<ffffffff812620df>] __getblk_slow+0xcf/0x2d0
Sep 26 23:04:56 habanero kernel: [<ffffffff81262341>] __getblk_gfp+0x61/0x70
Sep 26 23:04:56 habanero kernel: [<ffffffff81311b21>] jbd2_journal_get_descriptor_buffer+0x51/0xd0
Sep 26 23:04:56 habanero kernel: [<ffffffff8130a294>] jbd2_journal_commit_transaction+0xb74/0x1840
Sep 26 23:04:56 habanero kernel: [<ffffffff810d21ed>] ? sched_clock_cpu+0x9d/0xb0
Sep 26 23:04:56 habanero kernel: [<ffffffff810db5a1>] ? put_prev_entity+0x31/0x410
Sep 26 23:04:56 habanero kernel: [<ffffffff8110938e>] ? try_to_del_timer_sync+0x5e/0x90
Sep 26 23:04:56 habanero kernel: [<ffffffff8130e98a>] kjournald2+0xca/0x270
Sep 26 23:04:56 habanero kernel: [<ffffffff810e4d90>] ? wake_atomic_t_function+0x70/0x70
Sep 26 23:04:56 habanero kernel: [<ffffffff8130e8c0>] ? commit_timeout+0x10/0x10
Sep 26 23:04:56 habanero kernel: [<ffffffff810c0bf8>] kthread+0xd8/0xf0
Sep 26 23:04:56 habanero kernel: [<ffffffff810c0b20>] ? kthread_worker_fn+0x180/0x180
Sep 26 23:04:56 habanero kernel: [<ffffffff817a00e2>] ret_from_fork+0x42/0x70
Sep 26 23:04:56 habanero kernel: [<ffffffff810c0b20>] ? kthread_worker_fn+0x180/0x180
Sep 26 23:04:56 habanero kernel: ---[ end trace b454164d88e5c38f ]---
Created attachment 188661 [details]
hibernate error
This general protection fault was triggered on resume.
So I got similar GPF on resume. Again not always. But I think this may be unrelated to issue we reported earlier. The GPF was triggered at GDM login screen. While earlier hibernate/resume failures that I have seen results in blank screen and X does not even start (nor other services from what I can tell, I enabled SSH on laptop and see if I can SSH to it when it freezes). Matt, did you update your BIOS to 1.17? Lenovo posted a new BIOS update. I have updated mine and saw above GPF (could be unrelated to BIOS update). (In reply to Hemant Kumar from comment #20) > So I got similar GPF on resume. Again not always. But I think this may be > unrelated to issue we reported earlier. The GPF was triggered at GDM login > screen. While earlier hibernate/resume failures that I have seen results in > blank screen and X does not even start (nor other services from what I can > tell, I enabled SSH on laptop and see if I can SSH to it when it freezes). > > > > Matt, did you update your BIOS to 1.17? Lenovo posted a new BIOS update. I > have updated mine and saw above GPF (could be unrelated to BIOS update). Hemant, WOOHOOO Success! I have applied Chen's patch to my kernel and have been able to hibernate/resume successfully 15+ times in a row. Before I would get 3-8 successes before a failure. I am confident that his patch did something. Will you apply the patch and do some testing as well? I will continue to test for the next few days. ---------- SIDE NOTE: The debug kernel I built and testing using the kernel's built in test interface have gone nowhere, but I posted the info so Aaron and Chen can have some idea where the problem isn't located. In response to comment 15: The bug has returned! However the probability of occurrence has gone way down. I have gotten 10-20 hibernate resume cycles each time before seeing the bug. Previously it was 3-8 cycles. Chen, I think the patch helped, but did not completely fix the problem. You will have to instruct me on how to get further debug information. About serial log: it doesn't help much with USB converted serial, as that is actually a USB device sitting on top of a PCI device which means the log will only appear after the PCI and USB subsystem are initialized/resumed(the same is true for netconsole). In the case of a kernel problem, that usually doesn't happen. My previous comment about serial log means the real serial port, which is only available on develop/debug platform nowadays. (In reply to Aaron Lu from comment #23) > About serial log: it doesn't help much with USB converted serial, as that is > actually a USB device sitting on top of a PCI device which means the log > will only appear after the PCI and USB subsystem are initialized/resumed(the > same is true for netconsole). In the case of a kernel problem, that usually > doesn't happen. My previous comment about serial log means the real serial > port, which is only available on develop/debug platform nowadays. OK. How do we proceed? Do you need any additional test data from me? Hi, Matt, that patch is just to replace possible 'panic' during resume with a 'failure', can you please also do the test I mentioned above: (without my patch applied) 1. compile your disk driver as build-in (CONFIG_SATA_AHCI=y for example) 2. boot system with init=/bin/bash 3. after boot up, do: 3.1 swapon /dev/sda3 (your actual disk partition, not uuid please) 3.2 echo disk > /sys/power/state, 4. push power button to boot up again, append resume=/dev/sda3 in cmdline for second kernel (not uuid please), to see if it works. These steps are to confirm if it related to driver problem or a critical kernel bug. thanks! Hallo, same Problem on ThinkPad S540 (LENOVO 20B3CTO1WW/20B3CTO1WW) with debian/testing. I'm using also disk encryption (LUKS with LVM) on a SSD (SAMSUNG MZ7TD256HAFV-000L9) I also tried to get rid of systemd and used sysvinit with the same result, so I'm back using systemd. (In reply to Chen Yu from comment #25) > 1. compile your disk driver as build-in (CONFIG_SATA_AHCI=y for example) > 2. boot system with init=/bin/bash > 3. after boot up, do: > 3.1 swapon /dev/sda3 (your actual disk partition, not uuid please) > 3.2 echo disk > /sys/power/state, > 4. push power button to boot up again, append resume=/dev/sda3 in cmdline > for second kernel (not uuid please), to see if it works. Since Matt seems busy I tried these steps. I compiled the Kernel with CONFIG_SATA_AHCI=y booted into with init=/bin/bash: - swapon /dev/mapper/SSD--VG-swap - echo disk > /sys/power/state - Started up the maschine with resume=/dev/mapper/SSD--VG-swap It worked and I was back on the cmd (bash) For now i could set the resume parameter and some debugging kernel parameters (dyndbg='file kernel/power/hibernate.c +p;file kernel/power/snapshot.c +p') and log the memory usage before hibernation. Hi, Hannes does your thinkpad have the same problem with Matt's: it failed to resume from hibernation with black screen? because according to your test above, it appears to me that there is a driver problem, and I suspect it is related to graphics and sometime maybe due to insufficient memory when hibernating(hibernation needs 50% of the memory). So can you please help test again with CONFIG_DRM_I915 not set and compile the kernel again? thanks! Yu (In reply to Chen Yu from comment #27) > Hi, Hannes > does your thinkpad have the same problem with Matt's: > it failed to resume from hibernation with black screen? because according to > your test above, it appears to me that there is a driver problem, and I > suspect it is related to graphics and sometime maybe due to insufficient > memory when hibernating(hibernation needs 50% of the memory). So can you > please help test again with CONFIG_DRM_I915 not set and compile the kernel > again? thanks! > Yu Hi, i assume it's the same problem. Sometimes hibernation works and sometimes it fails. If it fails, I can see the image loading (removed "quiet" from kernel parameters) until 100% and then I'm stuck on the "black" screen. Yesterday I had some time and played around. Hibernation with a low memory footprint (free says about 200-400M from 16G RAM) works flawlessly but it seems if more memory (about 600M) is used it will fail to resume. Today I tried the LTS Kernel 3.18.22 and this one works even with a memory usage of 5G or more. So I guess it's something with in direction memory dumping to swap? My swap partition is bigger than my memory, not much but it worked it the past and also with 3.x kernel. When I configure my laptop without CONFIG_DRM_I915 it will not boot. Or should I boot to bash, fill the memory and test again? journalctl -all | grep 'CUSTOM\|PM: Allo' Here is some output with the working kernel: Okt 29 11:30:02 Lenovo-S540 hannes[2538]: CUSTOM: Starting hibernate!!! Okt 29 11:30:02 Lenovo-S540 hannes[2539]: CUSTOM: Mem used: 217160 Swap used: 0 Okt 29 12:04:17 Lenovo-S540 kernel: PM: Allocated 879544 kbytes in 0.26 seconds (3382.86 MB/s) Okt 29 12:16:00 Lenovo-S540 hannes[4040]: CUSTOM: Starting hibernate!!! Okt 29 12:16:00 Lenovo-S540 hannes[4041]: CUSTOM: Mem used: 676908 Swap used: 0 Okt 29 14:35:55 Lenovo-S540 kernel: PM: Allocated 1724576 kbytes in 0.29 seconds (5946.81 MB/s) Okt 29 14:54:33 Lenovo-S540 hannes[5491]: CUSTOM: Starting hibernate!!! Okt 29 14:54:33 Lenovo-S540 hannes[5492]: CUSTOM: Mem used: 1119652 Swap used: 0 Okt 29 15:38:39 Lenovo-S540 kernel: PM: Allocated 4065636 kbytes in 0.42 seconds (9680.08 MB/s) Okt 29 15:49:25 Lenovo-S540 hannes[6532]: CUSTOM: Starting hibernate!!! Okt 29 15:49:25 Lenovo-S540 hannes[6533]: CUSTOM: Mem used: 1525036 Swap used: 0 Okt 29 16:05:05 Lenovo-S540 kernel: PM: Allocated 4453240 kbytes in 0.43 seconds (10356.37 MB/s) Okt 29 16:16:04 Lenovo-S540 hannes[7515]: CUSTOM: Starting hibernate!!! Okt 29 16:16:04 Lenovo-S540 hannes[7516]: CUSTOM: Mem used: 5237928 Swap used: 0 Okt 29 16:25:35 Lenovo-S540 kernel: PM: Allocated 9811472 kbytes in 2.43 seconds (4037.64 MB/s) And her is the output when it fails (one success and the last one did not came back): Okt 28 21:27:00 Lenovo-S540 hannes[9291]: CUSTOM: Starting hibernate!!! Okt 28 21:27:00 Lenovo-S540 hannes[9292]: CUSTOM: Mem used: 503056 Swap used: 0 Okt 28 21:28:24 Lenovo-S540 kernel: PM: Allocated 1452204 kbytes in 0.30 seconds (4840.68 MB/s) Okt 28 21:28:59 Lenovo-S540 hannes[9676]: CUSTOM: Starting hibernate!!! Okt 28 21:28:59 Lenovo-S540 hannes[9677]: CUSTOM: Mem used: 606044 Swap used: 0 Let me know which information you need to locate the issue. (In reply to Hannes Fuchs from comment #28) > (In reply to Chen Yu from comment #27) > > Hi, Hannes > > does your thinkpad have the same problem with Matt's: > > it failed to resume from hibernation with black screen? because according > to > > your test above, it appears to me that there is a driver problem, and I > > suspect it is related to graphics and sometime maybe due to insufficient > > memory when hibernating(hibernation needs 50% of the memory). So can you > > please help test again with CONFIG_DRM_I915 not set and compile the kernel > > again? thanks! > > Yu > > Hi, > > i assume it's the same problem. Sometimes hibernation works and sometimes it > fails. If it fails, I can see the image loading (removed "quiet" from kernel > parameters) until 100% and then I'm stuck on the "black" screen. > Do you mean, after the monitor shows that image is 100% loaded, a moment later, the system turns into black? > Yesterday I had some time and played around. Hibernation with a low memory > footprint (free says about 200-400M from 16G RAM) works flawlessly but it > seems if more memory (about 600M) is used it will fail to resume. > > Today I tried the LTS Kernel 3.18.22 and this one works even with a memory > usage of 5G or more. > > So I guess it's something with in direction memory dumping to swap? My swap I'm not sure how the memory occupation would affect the resuming of hibernate, since you get 'black screen', I'd prefer to confirm if it is related to i915 graphic displaying problem first. > partition is bigger than my memory, not much but it worked it the past and > also with 3.x kernel. > > When I configure my laptop without CONFIG_DRM_I915 it will not boot. Or > should I boot to bash, fill the memory and test again? yes, please boot to bash with CONFIG_DRM_I915 disabled (you can use nomodeset in commandline) > > > > journalctl -all | grep 'CUSTOM\|PM: Allo' > > Here is some output with the working kernel: > Okt 29 11:30:02 Lenovo-S540 hannes[2538]: CUSTOM: Starting hibernate!!! > Okt 29 11:30:02 Lenovo-S540 hannes[2539]: CUSTOM: Mem used: 217160 Swap > used: 0 > Okt 29 12:04:17 Lenovo-S540 kernel: PM: Allocated 879544 kbytes in 0.26 > seconds (3382.86 MB/s) > Okt 29 12:16:00 Lenovo-S540 hannes[4040]: CUSTOM: Starting hibernate!!! > Okt 29 12:16:00 Lenovo-S540 hannes[4041]: CUSTOM: Mem used: 676908 Swap > used: 0 > Okt 29 14:35:55 Lenovo-S540 kernel: PM: Allocated 1724576 kbytes in 0.29 > seconds (5946.81 MB/s) > Okt 29 14:54:33 Lenovo-S540 hannes[5491]: CUSTOM: Starting hibernate!!! > Okt 29 14:54:33 Lenovo-S540 hannes[5492]: CUSTOM: Mem used: 1119652 Swap > used: 0 > Okt 29 15:38:39 Lenovo-S540 kernel: PM: Allocated 4065636 kbytes in 0.42 > seconds (9680.08 MB/s) > Okt 29 15:49:25 Lenovo-S540 hannes[6532]: CUSTOM: Starting hibernate!!! > Okt 29 15:49:25 Lenovo-S540 hannes[6533]: CUSTOM: Mem used: 1525036 Swap > used: 0 > Okt 29 16:05:05 Lenovo-S540 kernel: PM: Allocated 4453240 kbytes in 0.43 > seconds (10356.37 MB/s) > Okt 29 16:16:04 Lenovo-S540 hannes[7515]: CUSTOM: Starting hibernate!!! > Okt 29 16:16:04 Lenovo-S540 hannes[7516]: CUSTOM: Mem used: 5237928 Swap > used: 0 > Okt 29 16:25:35 Lenovo-S540 kernel: PM: Allocated 9811472 kbytes in 2.43 > seconds (4037.64 MB/s) > > And her is the output when it fails (one success and the last one did not > came back): > Okt 28 21:27:00 Lenovo-S540 hannes[9291]: CUSTOM: Starting hibernate!!! > Okt 28 21:27:00 Lenovo-S540 hannes[9292]: CUSTOM: Mem used: 503056 Swap > used: 0 > Okt 28 21:28:24 Lenovo-S540 kernel: PM: Allocated 1452204 kbytes in 0.30 > seconds (4840.68 MB/s) > Okt 28 21:28:59 Lenovo-S540 hannes[9676]: CUSTOM: Starting hibernate!!! > Okt 28 21:28:59 Lenovo-S540 hannes[9677]: CUSTOM: Mem used: 606044 Swap > used: 0 > > Let me know which information you need to locate the issue. (In reply to Chen Yu from comment #29) > (In reply to Hannes Fuchs from comment #28) > > (In reply to Chen Yu from comment #27) > > i assume it's the same problem. Sometimes hibernation works and sometimes > it > > fails. If it fails, I can see the image loading (removed "quiet" from > kernel > > parameters) until 100% and then I'm stuck on the "black" screen. > > > Do you mean, after the monitor shows that image is 100% loaded, a moment > later, the system turns into black? It's not really a black screen, it's stuck at 100%. See appended image as screenshot. > > When I configure my laptop without CONFIG_DRM_I915 it will not boot. Or > > should I boot to bash, fill the memory and test again? > yes, please boot to bash with CONFIG_DRM_I915 disabled (you can use > nomodeset in commandline) I ran multiple tests to be safe: I.) Kernel 4.2.5 *without* CONFIG_DRM_I915, CONFIG_SATA_AHCI=m, into init=/bin/bash II.) Kernel 4.2.5 *without* CONFIG_DRM_I915, CONFIG_SATA_AHCI=y, into init=/bin/bash III.) Kernel 4.2.5 *with* CONFIG_DRM_I915, CONFIG_SATA_AHCI=m, normal boot into DE I.) I compiled the Kernel 4.2.5 *without* CONFIG_DRM_I915 and booted with the following boot params: root=/dev/mapper/SSD--VG-root ro resume=/dev/mapper/SSD--VG-swap init=/bin/bash After booting I did this: 1. swapon /dev/mapper/SSD--VG-swap 2. echo disk > /sys/power/state 3. power on with the mentioned boot params 4. back to 2. I resumed 15 times without a problem. The memory consumption was about 42M, what I checked with "free -m". So it seems hibernation works fine. I think I can do this the whole day long. At this point I began to allocate memory with a simple python script. Therefore I started screen, allocated memory and began again with a loop of hibernate/resume cycles. After every 3 successful hibernate/resume cycles I increased the memory usage. a. +32M → ~88M memory consumption → 3 hibernate/resume cycles → no problem b. +64M → ~121M memory consumption → 3 hibernate/resume cycles → no problem c. +128M → ~186M memory consumption → 3 hibernate/resume cycles → no problem d. +256M → ~314M memory consumption → 3 hibernate/resume cycles → no problem e. +512M → ~571M memory consumption → 3 hibernate/resume cycles → no problem f. +1024M → ~1086M memory consumption → 3 hibernate/resume cycles → no problem g. +2048M → ~2117M memory consumption → 1st resume → failed II.) Same as in I.) +2048M → ~2116M memory consumption → 1st resume → failed I booted again into bash and wanted to force the error: a. +4096M → 4174M memory consumption → 3 hibernate/resume cycles → no problem b. +8192M → 8295M memory consumption → 1st resume → failed III.) After booting into /bin/bash and the hibernate/resume cycles I booted my old Kernel 3.18.22 and recompiled the Kernel 4.2.5 *with* CONFIG_DRM_I915 *enabled*. In this case the boot params were set to: root=/dev/mapper/SSD--VG-root ro resume=/dev/mapper/SSD--VG-swap So I logged into my DE (which is i3) and did following: 1. systemctl hibernate 2. power on, was back to my desktop 3. systemctl hibernate 4. power on ... and so on Also here I resumed 15 times without a problem, the memory consumption was about 225M. So, also no problem with resume. Then I increased the memory usage with a python script: a. +32M → ~265M memory consumption → 3 hibernate/resume cycles → no problem b. +64M → ~300M memory consumption → 3 hibernate/resume cycles → no problem c. +128M → ~365M memory consumption → 3 hibernate/resume cycles → no problem d. +256M → ~392M memory consumption → 2nd resume → failed! Conclusion: The behaviour is very strange. It seems that I can force a failure if I allocate more memory, but I can not say which is the limit. I tested the RAM modules with memtest, no error. Script to allocate memory: #!/usr/bin/env python import sys import numpy if __name__ == "__main__": if len(sys.argv) == 2: try: m = int(sys.argv[1]) print "allocating %i Mbytes" % m tmp = [numpy.random.bytes(1024*1024) for x in xrange(m)] print "%i Mbytes allocated" % m raw_input("press key to exit") except: print "can not parse arg: <%s>" % sys.argv[1] sys.exit(1) else: print "one arg needed" sys.exit(1) sys.exit(0) Created attachment 192201 [details]
failed resume output
Created attachment 201351 [details] Force set up the temporary page table for non-boot cpus before restoring the page frames Please apply this patch to see if it works for you. Currently I notice someone has reported that CPU hangs during restoring the page frames because of corrupt page table for nonboot cpus. This patch is a trival version to fix it. related link: https://bugzilla.kernel.org/show_bug.cgi?id=106371 Still locking up here on an X1 Carbon (3rd gen) with the patch applied to Debian's 4.3.3 kernel. Same here on my Lenovo S540 (Intel i5-4200U). The behavior is the same; stuck on 100% loading the image. With Kernel 4.4 the maschine reboots after some seconds stuck on 100% image loading. Like previously it fails somtimes but not every time and it seems to be it can be "forced" by high memory usage. Created attachment 202461 [details]
disable BUG_ON when resuming in IRQ rcu grace period
I found another bug might cause the system hang/reboot after restoring the system, not sure if it is related to this bug, but it would be nice if you guys can help test if this patch works(do not apply any other patches). plz have a try. otherwise I might need to propose some debugging patches to track your problems. thanks.
Still hanging with the latest patch applied. I see the same behavior as Hannes, where at low memory utilization on hibernate (~1GB), resume works fine. Once I get up around 2GB, resume hangs. My system has 16 GB RAM (and 16 GB swap). Thanks for your effot Chen. I also applied your latest patch on Kernel 4.3.3 and 4.4 and can confirm the behavior described by Matthew. With a memory usage above 1GB the resume will fail, but mostly not on the first attempt. I've been experiencing this problem on a Lenovo T450s running Kubuntu 15.04, Linux 4.2.0-27-generic. Has anyone had any luck identifying a fix or workaround for this? For me, resume fails at least 75% of the time with the same symptoms reported; resume starts, shows loading of the image, screen goes black, fan kicks on, unable to login remotely, fn keys work, but otherwise system seems unresponsive to the keyboard. I'll be happy to provide any logs or other information that would be useful to diagnose. Hello Tom, the only workarounds are: - avoid hibernate and use suspend on new kernel versions (4.x) - use an older kernel, ex. longterm 3.18.x (works for me) I am having the same issue (black screen after resume, with a likelyhood that seems to increase with memory usage) on a T450s with linux-4.4.5. One useful workaround that I would like to share is to use Intel Rapid Start Technology instead. A suspended machine is automatically hibernated after a set amount of time. This mechanism doesn't use the kernel's hibernate/resume mechanism and works flawlessly. The only change required is to designate the swap partition as type IRST instead of swap. Reproduced. Hardware is ThinkPad X1 Yoga, kernel is 4.6-rc5. I think this is widely applying to ThinkPads. Changing the title may be appropriate. I have the same issue on Arch with kernel 4.5.3-1-ARCH. The first resume after boot is usually OK, later ones fail. It is somehow related to memory usage. I have written a small .c app to allocate almost all RAM to easily reproduce it without destop environment (XFCE). Since failed resume doesn't get written to the logs I have attached the screenshot. It fails with "general protection fault: 0000 [#1] PREEMPT SMP". My laptop is Lenovo G710. Created attachment 216221 [details]
Screenshot of general protection fault
(In reply to Rafał Przywara from comment #42) > I have the same issue on Arch with kernel 4.5.3-1-ARCH. The first resume > after boot is usually OK, later ones fail. It is somehow related to memory > usage. I have written a small .c app to allocate almost all RAM to easily > reproduce it without destop environment (XFCE). Since failed resume doesn't > get written to the logs I have attached the screenshot. It fails with > "general protection fault: 0000 [#1] PREEMPT SMP". My laptop is Lenovo G710. Thank, this is useful, could you also share your cpp and your kernel config? besides please help check this 'patch' if there is any warning(reproduce) during resume? https://patchwork.kernel.org/patch/7454481/ (In reply to Chen Yu from comment #44) > (In reply to Rafał Przywara from comment #42) > > I have the same issue on Arch with kernel 4.5.3-1-ARCH. The first resume > > after boot is usually OK, later ones fail. It is somehow related to memory > > usage. I have written a small .c app to allocate almost all RAM to easily > > reproduce it without destop environment (XFCE). Since failed resume doesn't > > get written to the logs I have attached the screenshot. It fails with > > "general protection fault: 0000 [#1] PREEMPT SMP". My laptop is Lenovo > G710. > > Thank, this is useful, could you also share your cpp and your kernel config? > besides please help check this 'patch' if there is any warning(reproduce) > during resume? > https://patchwork.kernel.org/patch/7454481/ I patched kernel 4.5.4 but I don't see any warnings in logs pertaining to it. Meanwhile I testes various kernels and so far the latest working for me is 3.19.3. Next one I tested ie 4.0.7 hangs during resume. How I test it: 1) reboot 2) run swap.c to allocate some memory 3) in another tty do systemctl hibernate 4) resume 5) if it doesn't hang then goto 3) Usually second cycle hangs. Attached: kernel 4.5.4 config and source of my alloc test app. Created attachment 216311 [details]
Test alloc app
Created attachment 216321 [details]
Kernel 4.5.4 config (Lenovo G710)
After a successful resume from hibernate the following error message is printed: [drm:gen9_set_dc_state [i915]] *ERROR* DC state mismatch (0x0 -> 0x2) Is it useful to try to always force the corrective action for this mismatch to see if that fixes the black screen issue? If that were the case perhaps the mismatch detection is not robust. Although, why the mismatch happens in the first place is perhaps a more useful question. P.S. I also tried Chen Yu's patch (7454481) but I don't get the patched-in warnings printed during resume either. I also see this warning appear in the logs, not sure if relevant: --- WARNING: CPU: 0 PID: 807 at drivers/gpu/drm/i915/intel_uncore.c:599 hsw_unclaimed_reg_debug+0x69/0x90 [i915]() Unclaimed register detected after reading register 0x65f10 --- Just dropping in to report I have the same issue. Been living with it on a Carbon X1 2nd generation for months (a year?). Been trying all kinds of reworking of settings (BIOS and power management). Got a Carbon 4th generation this week -- same issue. The odd thing is when I use 3.16.0.7 kernel, I had same issue. Only when I used really older kernels did I not have the issue. But the 2nd generation Carbon X1 had that annoying capacity touch keyboard, so you need to use 3.14 or later if you want to properly use the machine. If you can live without a functioning keyboard, 3.9 I was using prior had no issue. I use BTRFS and overlays so I prefer to stay more current. The suggestion around IRST is mute. I did enable this and start using it. Was working ok, but I noticed that if you left the machine sit for a few days and try to start it up, it seemingly had forgotten it was in "deep sleep" and instead of awakening from "deep sleep" via IRST, it was cold booting. Also witnessed an annoying bug when the system would start up randomly from deep sleep and not shutdown until it hit 47%. I would know it had woken up because I would start up the machine and it would be cold booting and battery would be 47% (having been either 80% or fully charged the night before). I had my setting set at 30 mins, so it would be suspended for 30 mins, then wake up and go into deep sleep. At this point suspend light goes off, and next time you open the lid, you see "returning from deep sleep". To test this "waking up" theory after in deep sleep, I left it plugged in AC last week and put it in IRST (suspend, then confirm 30 mins later in deep sleep, went to work, and 8 hours later, got home to a system that was booted up and sitting at grub. I would rather use hibernation. It works on all kernels I throw at it on all my other machines (thinkpad and other). Would go months before cold boots on a T420. (In reply to Rafał Przywara from comment #45) > (In reply to Chen Yu from comment #44) > > (In reply to Rafał Przywara from comment #42) > > > I have the same issue on Arch with kernel 4.5.3-1-ARCH. The first resume > > > after boot is usually OK, later ones fail. It is somehow related to > memory > > > usage. I have written a small .c app to allocate almost all RAM to easily > > > reproduce it without destop environment (XFCE). Since failed resume > doesn't > > > get written to the logs I have attached the screenshot. It fails with > > > "general protection fault: 0000 [#1] PREEMPT SMP". My laptop is Lenovo > G710. > > > > Thank, this is useful, could you also share your cpp and your kernel > config? > > besides please help check this 'patch' if there is any warning(reproduce) > > during resume? > > https://patchwork.kernel.org/patch/7454481/ > > I patched kernel 4.5.4 but I don't see any warnings in logs pertaining to > it. Meanwhile I testes various kernels and so far the latest working for me > is 3.19.3. Next one I tested ie 4.0.7 hangs during resume. How I test it: > > 1) reboot > 2) run swap.c to allocate some memory > 3) in another tty do systemctl hibernate > 4) resume > 5) if it doesn't hang then goto 3) > > Usually second cycle hangs. Attached: kernel 4.5.4 config and source of my > alloc test app. Is there any difference with the following patch applied?(don't apply #Comment 44): https://patchwork.kernel.org/patch/9158227/ I applied 9158227 to linux 4.6.1 but still get the black screen at resume from hibernate. Are there any helpful messages in the logs that we should look for? By the way, the chance that a resume fails seems to be higher in the 4.6 kernel than in 4.5. But I have not tested this often enough to be able to say how much it differs. After two days of Kernel compiling and many hibernations/reboots I have some results that might help. tl;dr: The latest working Kernel (for me) is 3.18.22, on 3.18.23 the resume did not work and hangs on 100% while loading the image. The patch (9158227) mentioned in #Comment 51 did not work. I found one odd behavior: When the memory usage increases after a successful resume, the next resume from hibernation will fail, if it keeps the same it works (kernel version doesn't matter), until ~6GB. 6 GB (16 GB physical memory) will mostly fail. How I tested: 1.) Boot into Kernel x.y.z 2.) Allocate 1GB with Rafał Przywara alloc_app 3.) systemctl hibernate 4.) resume 5a.) If it doesn't hang goto 2.) but close app and reallocate. 5b.) After 3 successful resumes increase the allocation: 2GB, 4GB, 6GB, 8GB, 14GB and resume until it hangs 5c.) If it hangs, try again (mostly it will work with the new memory allocation), after 3 hangs abort At first I wanted to know which kernel is the latest working one and at which version the error occurs, so I tested the 3.18.x versions, but here I made a "fast" test, which means I began with 8GB allocation and if it works I increased it to 14 GB (which will fail, except 3.18.22 and lower) * 3.18.35 - failed at 8 GB * 3.18.24 - failed at 8 GB * 3.18.23 - failed at 8 GB * 3.18.22 - working Than I made more runs with 3.18.22: +--------+---------+---------+--------+ | *Size* | *Run 1* | *Run 2* | Run 3* | +--------+---------+---------+--------+ | 1 GB | OK | OK | OK | | 2 GB | OK | OK | OK | | 4 GB | OK | OK | OK | | 6 GB | OK | OK | OK | | 8 GB | OK | OK | OK | | 14 GB | OK | OK | OK | +--------+---------+---------+--------+ Looks good and works. For 3.18.23: +--------+---------+---------+--------+---------+---------+ | *Size* | *Run 1* | *Run 2* | Run 3* | *Run 4* | *Run 5* | +--------+---------+---------+--------+---------+---------+ | 1 GB | OK | OK | OK | skipped | skipped | | 2 GB | FAILED | OK | OK | OK | skipped | | 4 GB | FAILED | OK | OK | OK | skipped | | 6 GB | FAILED | OK | FAILED | FAILED | FAILED | | 8 GB | FAILED | OK | FAILED | FAILED | FAILED | | 14 GB | FAILED | FAILED | FAILED | skipped | skipped | +--------+---------+---------+--------+---------+---------+ When the memory allocation was increased, the resume failed. But after a fresh boot it worked with the same allocation (2GB/4GB Run2 - Run4). Run2 with 6 and 8 GB was only luck. After that I tested the 4.x kernel versions. All failed at 8 GB, so I didn't take a deeper look into it. Then I tested the patch (9158227) with different kernels: * 3.18.23 - failed at 8 GB * 3.18.35 - failed at 14 GB * 4.1.26 - failed at 14 GB (3rd run) * 4.4.13 - failed at 8 GB * 4.5.7 - failed at 8 GB * 4.6.2 - failed at 8 GB Kernel 4.1.26 (patched) looked promising, so I made more tests: +--------+---------+---------+--------+---------+---------+---------+ | *Size* | *Run 1* | *Run 2* | Run 3* | *Run 4* | *Run 5* | *Run 6* | +--------+---------+---------+--------+---------+---------+---------+ | 1 GB | OK | OK | OK | skipped | skipped | skipped | | 2 GB | FAILED | OK | OK | OK | skipped | skipped | | 4 GB | FAILED | OK | OK | OK | skipped | skipped | | 6 GB | FAILED | FAILED | OK | FAILED | FAILED | FAILED | +--------+---------+---------+--------+---------+---------+---------+ The 8 and 14 GB test were skipped. Hope this helps. (In reply to Hannes Fuchs from comment #53) > Hope this helps. Recently Rafael has fixed an resume hang issue due to incorrect page table copy-back, and I think this patch might be related to memory usage as you described, would you please check if the following patch work for you(it should be in the coming upstream kernel) https://patchwork.kernel.org/patch/9172981/ (In reply to Chen Yu from comment #54) > https://patchwork.kernel.org/patch/9172981/ This patch (9172981) fixed the bug for me. Tested with kernel 4.7.0-rc3 and applied patch (9172981). Allocated memory: 1x 8 GB, 5x 14 GB. Great work! our of curiosity - can this be backported to LTS kernel 4.4 - which is what a lot of distros are shipping right now? The above patch alone did not resolve this issue for me when applied to Debian's 4.5 kernel. So, there are either other patches that need to be packported, or this issue is not actually resolved. I also encountered issues getting my system to enter hibernation after applying this patch. My system would often lock up completely. I'll try a more recent kernel tonight and report back. This patch seems to completely fix the issue for me, on linux-4.6.2. Thanks a lot for this fix! All is well with kernel 4.7.0-rc3, so there may be something else required to get this working in 4.5 (and possibly earlier kernels.) Thanks for the fix! Installed kernel 4.6.2, and then downloaded the source, patched, and then installed the patched kernel. It survived the hibernation thaw but system is unstable. Within a few mins I was not able to invoke any applications or even reboot the system. will continue to test. The freeze mins following thaw from hibernation had the following event message associated with it: perf: interrupt took too long (2507 > 2500), lowering kernel.perf_event_max_sample_rate to 79750 It might be some other issue but could be related to this change. Will continue to test. I have been testing over the past week. No more unstableness after thaw, but this morning I was greeted with the blinking cursor on blackscreen when trying to thaw. I noticed just prior to and continuing on since, a TPM error on bootup (TPM could not read pcr). Guys, latest patch at: https://patchwork.kernel.org/patch/9208541/ https://patchwork.kernel.org/patch/9202321/ Would you please help check if it works? Chen, thanks for passing this on. I have two questions: Patch 9172981 already fixed the issue for me. It seems that patch 9208541 is a newer variant, but how do we determine whether it is better than the old patch? What are the issues with the old patch that are solved in the newer version? And how is patch 9202321 to be tested? How does it relate to the black screen on resume issue? The old patch does not solve a corner case that the restore_code may be in huge page, and we did not setup mapping for it. The patches are almost the same. 9202321 is just to avoid theoretical nonboot cpu panic during resume. No special actions are required to test this patch. Hi there, I just applied both patches (9208541 + 9202321) to Manjaro's current 4.6 Kernel (4.6.3-1-MANJARO) and it fixed the hibernate issue for me on a Thinkpad Yoga S1. Everything runs nicely so far and I could do a couple of hibernates with about 90% of 8GB RAM used. I will report back if I run into any issues. Thanks a lot for the fix!! Christoph I can report that the two patches (9208541 + 9202321) work well on a T460s, too. I have tested hibernate-resume extensively and not experienced any black screen (or halts of any sort) on resume. I consider this bug fixed. Thanks a lot to Chen Yu and Rafael Wysocki for fixing the issues!! Close because patch has been merged: commit 406f992e4a372dafbe3c2cff7efbb2002a5c8ebd Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Date: Thu Jul 14 03:55:23 2016 +0200 x86 / hibernate: Use hlt_play_dead() when resuming from hibernation and: commit 65c0554b73c920023cc8998802e508b798113b46 Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Date: Thu Jun 30 18:11:41 2016 +0200 x86/power/64: Fix kernel text mapping corruption during image restoration Patch 9208541 and 9202321 work perfectly on my laptop thinkpad T550. But not ok with my PC, ThinkCentre M8500. It got panic every time after resume from hibernation. The more times it hibernate the panic worse and finally crash completely. So I applied patch https://bugzilla.kernel.org/attachment.cgi?id=188341 too. Then panic was solved, but my linux crash in various ways. Sometimes page fault, sometimes stuck on opening terminal. And certainly it will freeze completely if I press ctrl+alt+f1 at this moment. I think it's still a panic, but I get nothing with dmesg. Please help. @Chen Yu. (In reply to iamtiancaif from comment #70) > Patch 9208541 and 9202321 work perfectly on my laptop thinkpad T550. > But not ok with my PC, ThinkCentre M8500. It got panic every time after > resume from hibernation. The more times it hibernate the panic worse and > finally crash completely. > So I applied patch https://bugzilla.kernel.org/attachment.cgi?id=188341 too. > Then panic was solved, but my linux crash in various ways. Sometimes page > fault, sometimes stuck on opening terminal. And certainly it will freeze > completely if I press ctrl+alt+f1 at this moment. I think it's still a > panic, but I get nothing with dmesg. > > Please help. @Chen Yu. Please check with latest upstream Vanilla kernel without any other patches applied(most of them have been merged upstream) and please provide the panic log if possible. Created attachment 255693 [details]
chromium damage hibernation
If I open chromium especially when I open tons of tags, my system will very slower and finally got totally panic and freeze. Before system freeze, there will be many syslog sometimes, like this:
kernel: page: count: mapcount mapping: (null) .
I have removed all number inside that message.
I compile kernel 4.6.4 from source and apply patch 9208541 and 9202321.
cause I need the unity mode of vmware 11 which disabled on vmware 12. And it's difficult to make vmware 11 be compatible with newer kernel. So I keep using kernel 4.6.4.
Please help~
Thanks!!!
(In reply to Chen Yu from comment #71) > (In reply to iamtiancaif from comment #70) > > Patch 9208541 and 9202321 work perfectly on my laptop thinkpad T550. > > But not ok with my PC, ThinkCentre M8500. It got panic every time after > > resume from hibernation. The more times it hibernate the panic worse and > > finally crash completely. > > So I applied patch https://bugzilla.kernel.org/attachment.cgi?id=188341 > too. > > Then panic was solved, but my linux crash in various ways. Sometimes page > > fault, sometimes stuck on opening terminal. And certainly it will freeze > > completely if I press ctrl+alt+f1 at this moment. I think it's still a > > panic, but I get nothing with dmesg. > > > > Please help. @Chen Yu. > > Please check with latest upstream Vanilla kernel without any other patches > applied(most of them have been merged upstream) and please provide the panic > log if possible. Hi, @Chen Yu. I have posted the error log on comment 72. everything is ok if I close chromium before hibernate. Even thought I open exactly all the same tags within firefox, tons of tags. I am running on debian 8: PRETTY_NAME="Debian GNU/Linux 8 (jessie)" NAME="Debian GNU/Linux" VERSION_ID="8" VERSION="8 (jessie)" ID=debian HOME_URL="http://www.debian.org/" SUPPORT_URL="http://www.debian.org/support" BUG_REPORT_URL="https://bugs.debian.org/" And this is my grub config: # If you change this file, run 'update-grub' afterwards to update # /boot/grub/grub.cfg. # For full documentation of the options in this file, see: # info -f grub -n 'Simple configuration' GRUB_DEFAULT=0 GRUB_TIMEOUT=5 GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian` GRUB_CMDLINE_LINUX_DEFAULT="quiet" # freeze more often with: initrd=/install/initrd.gz # GRUB_CMDLINE_LINUX="initrd=/install/initrd.gz" GRUB_CMDLINE_LINUX="" # Uncomment to enable BadRAM filtering, modify to suit your needs # This works with Linux (no patch required) and with any kernel that obtains # the memory map information from GRUB (GNU Mach, kernel of FreeBSD ...) #GRUB_BADRAM="0x01234567,0xfefefefe,0x89abcdef,0xefefefef" # Uncomment to disable graphical terminal (grub-pc only) #GRUB_TERMINAL=console # The resolution used on graphical terminal # note that you can use only modes which your graphic card supports via VBE # you can see them in real GRUB with the command `vbeinfo' #GRUB_GFXMODE=640x480 # Uncomment if you don't want GRUB to pass "root=UUID=xxx" parameter to Linux #GRUB_DISABLE_LINUX_UUID=true # Uncomment to disable generation of recovery mode menu entries #GRUB_DISABLE_RECOVERY="true" # Uncomment to get a beep at grub start #GRUB_INIT_TUNE="480 440 1" Thank you very much. |