Created attachment 172241 [details] Linux Version Script Overview: The system is unable to hibernate to disk. Both methods kernel internal and s2disk have been tried. A kernel fault due to an general protection fault happens during the hibernation process (OOPS Message at the end of this report). How to reproduce: 1. Boot the Laptop 2. Login, open some programs and say suspend to disk 3. Kernel Crashes At least on my system, but this is certainly hardware dependend. Actual Result: Kernel Fault Expected Result: Hibernation and S4 Sleep Workaround: Disable pm_async: # echo 0 > /sys/power/pm-async The hibernation works perfectly. I guess it's what the suspend_analyse script does too. (Result is attached, but everything works fine when using it) Additional Information: System Description: DELL Latitude D630 CPU: Dual Core (SMP), Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz ARCH: x86_64 RAM: 4GB Graphics: Intel GMA x3100 (i915) Hard-Drive: SATA in ATA mode (no AHCI) PCMIA Cards: No cards Swap-Partition: 5GB Encrypted (LUKS) System-Partition: EXT4 Encrypted (LUKS LVM) Boot-Partition: Seperate EXT4 Part (Not Encrypted) Bootloader: GRUB2 SELINUX: enabled, but has nothing to do with the hibernation (does'nt work even with selinux disabled) Distro: Name: openSUSE 13.2 Desktop: X11/KF5 using slim Kernel Info: - x86_64 kernel - 2 days old (Build Time: 2015/03/23) - Build from openSUSE (rpm) - Vanilla: no SUSE Patches (raw Mainline Code) OOPS Message: [ 321.286273] pcmcia_socket pcmcia_socket0: pccard: card ejected from slot 0 [ 323.284937] general protection fault: 0000 [#1] SMP [ 323.288011] Modules linked in: bnep bluetooth xt_recent nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack xt_owner xt_tcpudp nf_log_ipv4 nf_log_common xt_LOG xt_limit iptable_filter ip_tables arptable_filter arp_tables x_tables fuse vmw_vsock_vmci_transport vsock vmw_vmci ctr ccm af_packet snd_hda_codec_idt arc4 snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm snd_timer snd parport_pc ppdev parport tg3 iTCO_wdt iTCO_vendor_support ptp pps_core iwl4965 libphy iwlegacy gpio_ich lpc_ich mfd_core mac80211 joydev dell_laptop soundcore cfg80211 dell_wmi sparse_keymap serio_raw pcspkr 8250_fintek shpchp rfkill wmi coretemp kvm_intel acpi_cpufreq dcdbas tpm_tis processor i2c_i801 ac battery kvm tpm thermal i8k dm_crypt xts gf128mul algif_skcipher af_alg sr_mod cdrom ata_generic pcmcia ata_piix firewire_ohci firewire_core crc_itu_t uhci_hcd ehci_pci ehci_hcd i915 yenta_socket i2c_algo_bit pcmcia_rsrc pcmcia_core drm_kms_helper usbcore usb_common drm button video dm_mirror dm_region_hash dm_log dm_mod sg [ 323.288011] CPU: 0 PID: 321 Comm: kworker/u4:3 Tainted: G W 4.0.0-rc5-1.g3583a4a-vanilla #1 [ 323.288011] Hardware name: Dell Inc. Latitude D630 /0KU184, BIOS A19 06/04/2013 [ 323.288011] Workqueue: events_unbound async_run_entry_fn [ 323.288011] task: ffff880037066090 ti: ffff880036ff8000 task.ti: ffff880036ff8000 [ 323.288011] RIP: 0010:[<ffffffff8163c956>] [<ffffffff8163c956>] klist_next+0xd6/0xf0 [ 323.288011] RSP: 0018:ffff880036ffbc98 EFLAGS: 00010283 [ 323.288011] RAX: ffff8800d93ae700 RBX: ffff880036ffbcd8 RCX: ffff8800370ef66f [ 323.288011] RDX: ff8800370847f0ff RSI: ffff880036ffbcd8 RDI: ffff8800d93ae700 [ 323.288011] RBP: ffff880036ffbcc8 R08: 0000000000000000 R09: ffff880060000490 [ 323.288011] R10: 0000000066c8f33e R11: 0000000002da39cc R12: ff8800370847f0f7 [ 323.288011] R13: ffffffff81474f10 R14: 0000000000000000 R15: 0000000000000000 [ 323.288011] FS: 0000000000000000(0000) GS:ffff88011fc00000(0000) knlGS:0000000000000000 [ 323.288011] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 323.288011] CR2: 00007f3a8c003016 CR3: 0000000063c81000 CR4: 00000000000007f0 [ 323.288011] Stack: [ 323.288011] ffff880036ffbd30 0000000000000000 ffff880036ffbd1f ffffffff81485290 [ 323.288011] ffff88011ab1bf00 0000000000000080 ffff880036ffbd08 ffffffff81474ffe [ 323.288011] ffff8800d93ae700 0000000000000000 ffff880036ffbd08 ffff8800373ff830 [ 323.288011] Call Trace: [ 323.288011] [<ffffffff81485290>] ? dpm_wait+0x40/0x40 [ 323.288011] [<ffffffff81474ffe>] device_for_each_child+0x4e/0x70 [ 323.288011] [<ffffffff81486332>] __device_suspend+0x42/0x380 [ 323.288011] [<ffffffff8108ff5e>] ? try_to_wake_up+0x1ee/0x320 [ 323.288011] [<ffffffff8148668f>] async_suspend+0x1f/0xa0 [ 323.288011] [<ffffffff81084f1c>] async_run_entry_fn+0x4c/0x160 [ 323.288011] [<ffffffff8107cd8a>] process_one_work+0x14a/0x3f0 [ 323.288011] [<ffffffff8107d461>] worker_thread+0x121/0x460 [ 323.288011] [<ffffffff8107d340>] ? rescuer_thread+0x310/0x310 [ 323.288011] [<ffffffff810823c9>] kthread+0xc9/0xe0 [ 323.288011] [<ffffffff81082300>] ? kthread_create_on_node+0x180/0x180 [ 323.288011] [<ffffffff81651c98>] ret_from_fork+0x58/0x90 [ 323.288011] [<ffffffff81082300>] ? kthread_create_on_node+0x180/0x180 [ 323.288011] Code: 41 0f 95 c7 eb 99 0f 1f 80 00 00 00 00 48 8b 03 45 31 ff 48 8b 48 08 4c 8d 61 f8 eb 82 49 8b 54 24 08 4c 8d 62 f8 49 39 c4 74 a3 <f6> 42 f8 01 74 82 eb ea 66 90 e8 0a 0d 01 00 eb 8b 66 0f 1f 84 [ 323.288011] RIP [<ffffffff8163c956>] klist_next+0xd6/0xf0 [ 323.288011] RSP <ffff880036ffbc98> [ 323.295406] ---[ end trace 601d92f4a439b5e2 ]--- [ 323.295442] BUG: unable to handle kernel paging request at ffffffffffffffd8 [ 323.295445] IP: [<ffffffff810829b0>] kthread_data+0x10/0x20 [ 323.295447] PGD 1c11067 PUD 1c13067 PMD 0 [ 323.295449] Oops: 0000 [#2] SMP [ 323.295476] Modules linked in: bnep bluetooth xt_recent nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack xt_owner xt_tcpudp nf_log_ipv4 nf_log_common xt_LOG xt_limit iptable_filter ip_tables arptable_filter arp_tables x_tables fuse vmw_vsock_vmci_transport vsock vmw_vmci ctr ccm af_packet snd_hda_codec_idt arc4 snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm snd_timer snd parport_pc ppdev parport tg3 iTCO_wdt iTCO_vendor_support ptp pps_core iwl4965 libphy iwlegacy gpio_ich lpc_ich mfd_core mac80211 joydev dell_laptop soundcore cfg80211 dell_wmi sparse_keymap serio_raw pcspkr 8250_fintek shpchp rfkill wmi coretemp kvm_intel acpi_cpufreq dcdbas tpm_tis processor i2c_i801 ac battery kvm tpm thermal i8k dm_crypt xts gf128mul algif_skcipher af_alg sr_mod cdrom ata_generic pcmcia ata_piix firewire_ohci firewire_core crc_itu_t uhci_hcd ehci_pci ehci_hcd i915 yenta_socket i2c_algo_bit pcmcia_rsrc pcmcia_core drm_kms_helper usbcore usb_common drm button video dm_mirror dm_region_hash dm_log dm_mod sg [ 323.295487] CPU: 0 PID: 321 Comm: kworker/u4:3 Tainted: G D W 4.0.0-rc5-1.g3583a4a-vanilla #1 [ 323.295488] Hardware name: Dell Inc. Latitude D630 /0KU184, BIOS A19 06/04/2013 [ 323.295498] task: ffff880037066090 ti: ffff880036ff8000 task.ti: ffff880036ff8000 [ 323.295500] RIP: 0010:[<ffffffff810829b0>] [<ffffffff810829b0>] kthread_data+0x10/0x20 [ 323.295501] RSP: 0018:ffff880036ffba38 EFLAGS: 00010096 [ 323.295502] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000000000000f [ 323.295503] RDX: 000000000000000e RSI: 0000000000000000 RDI: ffff880037066090 [ 323.295504] RBP: ffff880036ffba38 R08: 0000000000000000 R09: 0000000000000246 [ 323.295505] R10: ffffffff81fe9188 R11: 000000000000001a R12: ffff880037066090 [ 323.295505] R13: 00000000000142c0 R14: 0000000000000000 R15: 0000000000000000 [ 323.295507] FS: 0000000000000000(0000) GS:ffff88011fc00000(0000) knlGS:0000000000000000 [ 323.295508] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 323.295509] CR2: 0000000000000028 CR3: 0000000063c81000 CR4: 00000000000007f0 [ 323.295510] Stack: [ 323.295512] ffff880036ffba58 ffffffff8107d825 ffff880036ffba58 ffff88011fc142c0 [ 323.295513] ffff880036ffbaa8 ffffffff8164dec0 ffff880036e66570 ffff880037066090 [ 323.295515] ffff880036ffbaa8 ffff880036ffbfd8 ffff880037066510 0000000000000246 [ 323.295515] Call Trace: [ 323.295518] [<ffffffff8107d825>] wq_worker_sleeping+0x15/0xa0 [ 323.295521] [<ffffffff8164dec0>] __schedule+0x6f0/0x950 [ 323.295523] [<ffffffff8164e157>] schedule+0x37/0x90 [ 323.295525] [<ffffffff8106750c>] do_exit+0x6bc/0xb40 [ 323.295528] [<ffffffff8100698f>] oops_end+0x9f/0xe0 [ 323.295530] [<ffffffff81006efb>] die+0x4b/0x70 [ 323.295532] [<ffffffff81003622>] do_general_protection+0xe2/0x170 [ 323.295534] [<ffffffff81474f10>] ? put_device+0x20/0x20 [ 323.295536] [<ffffffff81653cf8>] general_protection+0x28/0x30 [ 323.295537] [<ffffffff81474f10>] ? put_device+0x20/0x20 [ 323.295540] [<ffffffff8163c956>] ? klist_next+0xd6/0xf0 [ 323.295542] [<ffffffff8163c8a4>] ? klist_next+0x24/0xf0 [ 323.295543] [<ffffffff81485290>] ? dpm_wait+0x40/0x40 [ 323.295545] [<ffffffff81474ffe>] device_for_each_child+0x4e/0x70 [ 323.295547] [<ffffffff81486332>] __device_suspend+0x42/0x380 [ 323.295549] [<ffffffff8108ff5e>] ? try_to_wake_up+0x1ee/0x320 [ 323.295551] [<ffffffff8148668f>] async_suspend+0x1f/0xa0 [ 323.295553] [<ffffffff81084f1c>] async_run_entry_fn+0x4c/0x160 [ 323.295555] [<ffffffff8107cd8a>] process_one_work+0x14a/0x3f0 [ 323.295557] [<ffffffff8107d461>] worker_thread+0x121/0x460 [ 323.295559] [<ffffffff8107d340>] ? rescuer_thread+0x310/0x310 [ 323.295561] [<ffffffff810823c9>] kthread+0xc9/0xe0 [ 323.295563] [<ffffffff81082300>] ? kthread_create_on_node+0x180/0x180 [ 323.295564] [<ffffffff81651c98>] ret_from_fork+0x58/0x90 [ 323.295566] [<ffffffff81082300>] ? kthread_create_on_node+0x180/0x180 [ 323.295583] Code: 00 48 89 e5 5d 48 8b 40 c8 48 c1 e8 02 83 e0 01 c3 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 48 8b 87 30 05 00 00 55 48 89 e5 <48> 8b 40 d8 5d c3 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 [ 323.295584] RIP [<ffffffff810829b0>] kthread_data+0x10/0x20 [ 323.295585] RSP <ffff880036ffba38> [ 323.295586] CR2: ffffffffffffffd8 [ 323.295587] ---[ end trace 601d92f4a439b5e3 ]--- [ 323.295588] Fixing recursive fault but reboot is needed! [ 323.299147] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0 [ 323.299147] Shutting down cpus with NMI [ 323.299147] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff) [ 323.299147] drm_kms_helper: panic occurred, switching back to text console [ 323.299147] Rebooting in 90 seconds..
Created attachment 172251 [details] Kernel Modules
Created attachment 172261 [details] PCI Devices
Created attachment 172271 [details] USB Devices
Created attachment 172281 [details] suspend_analyse script stats
Created attachment 172291 [details] suspend_analyse script ftrace
Created attachment 172301 [details] suspend_analyse script dmesg
Created attachment 172321 [details] Kernel Messages on Crash These messages are aquired using a serial connection to an other computer and the no_console_suspend option. But the crash happens idependently from this options.
Created attachment 172401 [details] Crash 2 Dmesg There is another possible result when trying to hibernate, the kernel freezes before a hibernation image gets created. (In this try PCMIA was completely disabled, it seems to have nothing to do with this problem)
Is this a new problem or it also happened on older kernels?
Older kernels: I had similar freezes during suspend with older versions too, but i've no kernel logs for that. So I'm not quiote sure if it was the same problem back then. They freezed but there was no kernel fault with reboot (similar too crash 2 dmesg) NOTE: they don't continue like the newest version, i've waited for several hours Newer snapshots: I'm currently trying a newer kernel version from March 24 (4.0.0-rc5-2.g7636f33-vanilla). At least till now i've not been able to reproduce this problem with it but i will do some further tries. I had a similar freeze with the newer version too, but it suceed after an awful long time of waiting for CPU 1 to come up (i'll attach a dmesg for that too)
Created attachment 172431 [details] Crash 3 New Kernel
An additional note: In case of freezes theres no GPF in serial log and no reboot.
Created attachment 172441 [details] Crash Kernel 3.19.2-1.gf2f9797-vanilla Kernel 3.19.2-1.gf2f9797-vanilla (current stable): Kernel freezes too but no oops (I don't get this oops in 4.0-rc5-1 all the time also) dmesg attached NOTE (if you wonder about drm oops in dmesg): drm oops are different ugly problem that has nothing to do with the hibernation as far I am concerned about it (it's nearly fixed completely in 4.0)
Kernel 3.16.7 (openSUSE Standard vanilla kernel) I was not able to reproduce the bug in this version of the kernel. I assume that it doesn't exits in this kernel version
OK, I did some further testing and I first need to correct some things. Namely: - The suggested workaround does not work its was only luck (see smp/nosmp) -> New workaround nomodeset - The error happens idependently of the pm_async option, my first experiences with that are based on the random behavior of this bug when smp is enabled - The crash can also occur on resume (see dmesg_bug_on_resume.txt) - I can now not be sure that drm errors don't have something to do with the suspend problems (see BUG analysis) I tested 4.0.0-rc5-2.g7636f33-vanilla and theres the same issue there as well in 4.0.0-rc5-4.g2458897-vanilla aswell as in 4.0.0-rc5 downloaded from kernel.org SMP/non-SMP: I disabled the second core in BIOS and tried nosmp option in grub2. Result: - SMP enabled: -> BUG can occur but does not every time sometimes there is only a freeze - SMP disabled: -> BUG occurs reproducible every time -> There no freezes, but always the oops message BUG analysis: I further analysed the critical code where the crash happens a bit further a disassembly with comments is attached in disassembly_critical.txt Because klist_next gets called from dpm_wait in this case I added a few printk to the source code to get more information. I added a full dmesg of this (see dmesg-rc5-2-debug-device-detailed.txt). But the result is that the device on wich the fault occurs is different each time. NOTE: I can add files that show the exact changes if you want But never the less some drivers corrupts the kref lists in some form. A hint may be in the drm oops (in Kernel Messages on Crash): WARNING: CPU: 0 PID: 271 at ../include/linux/kref.h:47 drm_framebuffer_reference+0x6e/0x80 [drm]() WORKAROUND NOMODESET: I did some further testing and the result is there are no oops when I boot with nomodeset enabled, suspend works just fine when using this option. So my guess is the i915 graphics driver is to blame for this. I know there is currently active development on this driver, this can be seen in the commit logs. I will try the most recent linux version from git (commit e42391cd048809d903291d07f86ed3934ce138e9) and look if the bug still exits. I will report if the bug disappears and close the report. If you need more information, tell me and I'll try to provide it.
Created attachment 172681 [details] dmesg_bug_on_resume.txt BUG occuring on resume
Created attachment 172691 [details] disassembly_critical.txt Disassembly of the critical section with comments
Created attachment 172701 [details] dmesg-rc5-2-debug-device-detailed.txt Full Dmesg with the added debug statements. (Device that is processed during crash differs from try to try)
Created attachment 172711 [details] disassembly_critical.txt Some minor correction in the comments Summary: Some driver creates an corrupted list entry that crashes the kernel
BUG INFO: This BUG is caused by the i915 drm driver. There is a missing implementation of drm_framebuffer_unreference causing the list to break. This is what causes th drm bug also. There's only a BUG() put inside the placeholder function so this can't work. RESOLVED: I can confirm that this bug is fixed in 4.0.0-rc6 due to the changes made in drivers/gpu/drm/drm_crtc.c. The missing implementations got added and now everything work's just fine. One last question: Should i set the bug status to CODE_FIX or PATCH_ALREADY_AVAILABLE?
PATCH_ALREADY_AVAILABLE