Bug 95771

Summary: Crash when trying to hibernate
Product: Drivers Reporter: higuita (higuita)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: NEW ---    
Severity: normal CC: oded.gabbay, szg00000
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.19.3 Subsystem:
Regression: No Bisected commit-id:
Attachments: New crash dump

Description higuita 2015-03-29 03:20:57 UTC
I usually hibernate my machine and sometimes it looks like it hibernates, but fail to shutdown the machine... on next boot i found that the hibernation failled as the machine start up normally

I finally got a usb-serial cable to connect this computer to another and manage to log this via serial port when the hibernation fail:


[ 6500.667464] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[ 6500.667481] IP: [<ffffffffa0360653>] radeon_vm_bo_invalidate+0x73/0xa0 [radeon]
[ 6500.667483] PGD 864dc067 PUD 864dd067 PMD 0 
[ 6500.667485] Oops: 0002 [#1] SMP 
[ 6500.667508] Modules linked in: usb_storage cpufreq_conservative ipt_ECN snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss nf_conntrack_ipv6 nf_defrag_ipv6 ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 ip6table_mangle ip6table_filter ip6_tables xt_DSCP nf_nat_irc nf_nat nf_conntrack_irc nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack nf_log_ipv4 nf_log_common xt_LOG xt_limit ipt_REJECT nf_reject_ipv4 xt_tcpudp iptable_mangle iptable_filter ip_tables x_tables bridge stp llc ipv6 sch_fq_codel pcspkr fuse joydev hid_generic snd_hda_codec_realtek snd_hda_codec_generic tuner_xc2028 dib7000p dvb_usb_dib0700 dib7000m dib0090 dib0070 dib3000mc dibx000_common dvb_usb dvb_core usbhid hid eeepc_wmi asus_wmi sparse_keymap rfkill i2c_dev acpi_cpufreq snd_hda_codec_hdmi tuner_simple tuner_types tea5767 tuner tda7432 tvaudio msp3400 crc32_pclmul crc32c_intel snd_hda_intel ghash_clmulni_intel snd_hda_controller amdkfd amd_iommu_v2 ohci_pci ohci_hcd bttv btcx_risc tveeprom videobuf_dma_sg rc_core radeon v4l2_common snd_hda_codec snd_bt87x fam15h_power k10temp snd_pcm r8169 videodev evdev snd_timer xhci_pci edac_core ehci_pci drm_kms_helper videobuf_core mii microcode xhci_hcd ttm ehci_hcd i2c_piix4 wmi snd parport_pc soundcore parport video processor thermal_sys button loop [last unloaded: lz4_compress]
[ 6500.667526] CPU: 2 PID: 5386 Comm: SteamChildMonit Tainted: G        W      3.19.3-slack #18
[ 6500.667526] Hardware name: System manufacturer System Product Name/A88X-PLUS, BIOS 2202 01/19/2015
[ 6500.667527] task: ffff8801d630dac0 ti: ffff8800817ec000 task.ti: ffff8800817ec000
[ 6500.667541] RIP: 0010:[<ffffffffa0360653>]  [<ffffffffa0360653>] radeon_vm_bo_invalidate+0x73/0xa0 [radeon]
[ 6500.667542] RSP: 0000:ffff8800817efa68  EFLAGS: 00010246
[ 6500.667543] RAX: ffff8800989ae600 RBX: ffff8801d60a5480 RCX: 0000000000000000
[ 6500.667544] RDX: ffff8801d60a54e0 RSI: ffff8801d4e59400 RDI: ffff8800989ae630
[ 6500.667545] RBP: ffff8801d4e596a8 R08: ffff8801d4fad178 R09: 0000000100400028
[ 6500.667545] R10: ffffea0007bc99c0 R11: ffffffff8171ff65 R12: ffff8801d4e59400
[ 6500.667546] R13: ffff880212864710 R14: 000000000000005e R15: ffff8801d4e59468
[ 6500.667548] FS:  00007fa35204c7c0(0000) GS:ffff88021ed00000(0000) knlGS:00000000ef83fb40
[ 6500.667549] CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
[ 6500.667550] CR2: 0000000000000008 CR3: 00000000864db000 CR4: 00000000000407e0
[ 6500.667550] Stack:
[ 6500.667552]  ffff8801d4e59468 ffff8801d4e59468 0000000000000000 ffffffffa029a0dd
[ 6500.667554]  ffff8801d4e59468 0000000000000002 ffff880213f0b300 ffffffffa00ea2de
[ 6500.667555]  ffff8801d4e59498 ffffffffa00eb3b3 0000000000012540 ffff8800817efb20
[ 6500.667556] Call Trace:
[ 6500.667571]  [<ffffffffa029a0dd>] ? radeon_bo_move_notify+0x5d/0xb0 [radeon]
[ 6500.667579]  [<ffffffffa00ea2de>] ? ttm_bo_cleanup_memtype_use+0x1e/0x80 [ttm]
[ 6500.667587]  [<ffffffffa00eb3b3>] ? ttm_bo_release+0x1f3/0x2c0 [ttm]
[ 6500.667590]  [<ffffffff81706a40>] ? drm_gem_dumb_destroy+0x20/0x20
[ 6500.667604]  [<ffffffffa0299408>] ? radeon_bo_unref+0x28/0x50 [radeon]
[ 6500.667620]  [<ffffffffa02adf2a>] ? radeon_gem_object_free+0x3a/0x40 [radeon]
[ 6500.667623]  [<ffffffff817068db>] ? drm_gem_object_handle_unreference_unlocked+0xdb/0x110
[ 6500.667625]  [<ffffffff81706a9d>] ? drm_gem_object_release_handle+0x5d/0x90
[ 6500.667627]  [<ffffffff8161ab83>] ? idr_for_each+0xa3/0xf0
[ 6500.667630]  [<ffffffff81994e6e>] ? mutex_lock+0xe/0x2a
[ 6500.667632]  [<ffffffff817072dc>] ? drm_gem_release+0x1c/0x30
[ 6500.667635]  [<ffffffff81706093>] ? drm_release+0x403/0x4f0
[ 6500.667637]  [<ffffffff811a2d33>] ? __fput+0xe3/0x200
[ 6500.667640]  [<ffffffff8109995c>] ? task_work_run+0xcc/0xf0
[ 6500.667643]  [<ffffffff810808ac>] ? do_exit+0x2dc/0xaf0
[ 6500.667644]  [<ffffffff810017f9>] ? __switch_to+0x449/0x520
[ 6500.667647]  [<ffffffff81081ebd>] ? do_group_exit+0x3d/0xa0
[ 6500.667649]  [<ffffffff8108cd94>] ? get_signal+0x1b4/0x5c0
[ 6500.667651]  [<ffffffff810b86fe>] ? remove_wait_queue+0x1e/0x70
[ 6500.667654]  [<ffffffff81002309>] ? do_signal+0x49/0xa20
[ 6500.667656]  [<ffffffff810800b0>] ? task_stopped_code+0x50/0x50
[ 6500.667659]  [<ffffffff810f91b7>] ? C_SYSC_wait4+0xf7/0x100
[ 6500.667660]  [<ffffffff810a7d9d>] ? wake_up_new_task+0xed/0x190
[ 6500.667663]  [<ffffffff81002d3d>] ? do_notify_resume+0x5d/0x80
[ 6500.667665]  [<ffffffff81996e27>] ? int_signal+0x12/0x17
[ 6500.667683] Code: 48 89 10 48 b8 00 01 10 00 00 00 ad de 48 89 43 60 48 b8 00 02 20 00 00 00 ad de 48 8d 53 60 48 89 43 68 48 8b 43 70 48 8b 48 38 <48> 89 51 08 48 89 4b 60 48 8d 48 38 48 89 4b 68 48 89 50 38 48 
[ 6500.667696] RIP  [<ffffffffa0360653>] radeon_vm_bo_invalidate+0x73/0xa0 [radeon]
[ 6500.667697]  RSP <ffff8800817efa68>
[ 6500.667698] CR2: 0000000000000008
[ 6500.667699] ---[ end trace 595ff6f3d8afbf5b ]---
[ 6500.667700] Fixing recursive fault but reboot is needed!
[ 6511.926155] eth0: port 1(peth0) entered forwarding state
[ 7247.320362] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
[ 7247.339662] ata6.00: waking up from sleep
[ 7247.351674] ata6: hard resetting link
[ 7248.077942] ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 7248.101388] ata6.00: configured for UDMA/100
[ 7248.114184] ata6: EH complete

So after this, i don't have a working nor a hibernated machine

I'm using slackwre64-current, in a A10-7850k CPU and a Asus A88X-plus
Comment 1 higuita 2015-05-10 18:06:53 UTC
Created attachment 176351 [details]
New crash dump

I got a new crash, but this time, after reseting the PC, i could load the hibernation image.

Also, i could not get this dump in the serial link, even with the no_console_suspend

i'm also using now kernel 4.0.1
Comment 2 Michel Dänzer 2015-05-13 07:16:55 UTC
(In reply to higuita from comment #1)
> New crash dump

Does building the kernel without CONFIG_HSA_AMD avoid this one?
Comment 3 higuita 2015-05-28 03:27:10 UTC
> Does building the kernel without CONFIG_HSA_AMD avoid this one?

yes, looks like it helps... so the something in the HSA is sometimes breaking the hibernation.

i'm now in kernel 4.0.4 and it seems that i can almost hibernate... it manage to dump the memory to the swap, turn off the machine successful. 

the problem is now loading the hibernated image from the swap, i'm getting consistent crashes. here is the dump:

[    7.926169] sd 5:0:0:0: [sdd] Attached SCSI disk
[    8.259718]  sda: sda1 sda2 < sda5 sda6 sda7 sda8 sda9 sda10 >
[    8.277704] sd 0:0:0:0: [sda] Attached SCSI disk
[    8.293411] Freezing user space processes ... (elapsed 0.000 seconds) done.
[   10.573914] PM: Using 3 thread(s) for decompression.
[   10.573914] PM: Loading and decompressing image data (338911 pages)...
[   11.510076] random: nonblocking pool is initialized
[   11.659242] PM: Image loading progress:   0%
[   13.442604] PM: Image loading progress:  10%
[   15.286665] PM: Image loading progress:  20%
[   15.590044] BUG: unable to handle kernel paging request at ffff88005e70d000
[   15.610900] IP: [<ffffffff810c89d3>] load_image_lzo+0x783/0xc30
[   15.628633] PGD 2146067 PUD 21efff067 PMD 21effd067 PTE 0
[   15.644832] Oops: 0002 [#1] SMP 
[   15.654499] Modules linked in:
[   15.663633] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.0.4-slack #8
[   15.682639] Hardware name: System manufacturer System Product Name/A88X-PLUS, BIOS 2302 04/02/2015
[   15.709452] task: ffff880217940000 ti: ffff880217948000 task.ti: ffff880217948000
[   15.731844] RIP: 0010:[<ffffffff810c89d3>]  [<ffffffff810c89d3>] load_image_lzo+0x783/0xc30
[   15.756849] RSP: 0000:ffff88021794bce8  EFLAGS: 00010246
[   15.772734] RAX: 0000000a00000002 RBX: 0000000000014dc5 RCX: ffff88005e70d000
[   15.794084] RDX: 0000000000000000 RSI: ffffc900047a50b0 RDI: ffff88005e70d008
[   15.815435] RBP: 0000000000008463 R08: 0000000000000000 R09: ffff880215cd7178
[   15.836784] R10: ffff88021794bc64 R11: ffff880215cd7178 R12: ffffc9000474c000
[   15.858133] R13: ffff88021794bdf8 R14: ffffc900047a0058 R15: 0000000000005000
[   15.879484] FS:  0000000000000000(0000) GS:ffff88021ec80000(0000) knlGS:0000000000000000
[   15.903693] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   15.920881] CR2: ffff88005e70d000 CR3: 0000000001e0c000 CR4: 00000000000406e0
[   15.942231] Stack:
[   15.948229]  ffffc9000475d000 0000000308574740 ffff880215ca0900 ffffea0000000001
[   15.970409]  ffff88021794be10 00052bdf00000000 ffff88021794bdf8 0000014000000000
[   15.992594]  0000006700002000 00000001fafc9615 ffff880215ca0930 ffff880215ca0918
[   16.014775] Call Trace:
[   16.022078]  [<ffffffff810bb750>] ? wait_woken+0x90/0x90
[   16.037973]  [<ffffffff810c9b90>] ? swsusp_read+0x250/0x340
[   16.054639]  [<ffffffff810c5110>] ? hibernation_restore+0x140/0x140
[   16.073387]  [<ffffffff819b6b32>] ? printk+0x4d/0x52
[   16.088231]  [<ffffffff810c5110>] ? hibernation_restore+0x140/0x140
[   16.106979]  [<ffffffff810c535d>] ? software_resume+0x24d/0x2b0
[   16.124687]  [<ffffffff810002e8>] ? do_one_initcall+0x98/0x1f0
[   16.142132]  [<ffffffff8109c700>] ? parse_args+0x180/0x410
[   16.158540]  [<ffffffff81f1d001>] ? kernel_init_freeable+0x172/0x1f8
[   16.177546]  [<ffffffff819b4370>] ? rest_init+0x70/0x70
[   16.193172]  [<ffffffff819b437e>] ? kernel_init+0xe/0xf0
[   16.209057]  [<ffffffff819c1a88>] ? ret_from_fork+0x58/0x90
[   16.225722]  [<ffffffff819b4370>] ? rest_init+0x70/0x70
[   16.241346] Code: ff eb 16 0f 1f 44 00 00 49 81 c7 00 10 00 00 4d 39 7e 48 0f 86 73 02 00 00 4b 8d 74 37 58 49 8b 4d 08 31 d2 48 8b 06 48 8d 79 08 <48> 89 01 48 8b 86 f8 0f 00 00 48 83 e7 f8 48 89 81 f8 0f 00 00 
[   16.298669] RIP  [<ffffffff810c89d3>] load_image_lzo+0x783/0xc30
[   16.316649]  RSP <ffff88021794bce8>
[   16.327070] CR2: ffff88005e70d000
[   16.336971] ---[ end trace adca2fce0a7255b9 ]---
[   16.350821] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
[   16.350821] 
[   16.378227] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
[   16.408684] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
[   16.408684] 

Don't know is i should open a new bug for this crash and transform this bug in "HSA breaks hibernation" or if i should keep keeps the 2 dumps in this bug report.

Thanks for the help
Comment 4 Oded Gabbay 2015-05-28 06:02:32 UTC
I forwarded this to kfd team in AMD and I will also try to look at this problem in the next few days.

Oded
Comment 5 Michel Dänzer 2015-05-28 06:11:12 UTC
(In reply to higuita from comment #3)
> Don't know is i should open a new bug for this crash [...]

Please do, that doesn't look directly related to the GPU drivers.
Comment 6 Szőgyényi Gábor 2017-03-06 20:07:01 UTC
Please try to reproduce this bug with latest kernel image.