Bug 100071

Summary: Crash on several PRIME radeon usage
Product: Drivers Reporter: higuita (higuita)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: RESOLVED OBSOLETE    
Severity: normal    
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.0.4 Subsystem:
Regression: No Bisected commit-id:

Description higuita 2015-06-17 18:33:11 UTC
I have a lenovo thinkpad S440 with this hardware:

$ lspci | grep "Display\|VGA"
00:02.0 VGA compatible controller: Intel Corporation Haswell-ULT Integrated Graphics Controller (rev 09)
06:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Sun XT [Radeon HD 8670A/8670M/8690M] (rev ff)

i'm using ubuntu 15.04, but with the kernek 4.0.4.
i'm using Mesa 10.5.2

I use this to setup the PRIME:

xrandr --setprovideroffloadsink radeon Intel
xcompmgr &

and i start the warthunder game (closed source, bug free game) using this
DRI_PRIME=1 ./aces


All works fine, but after about 2 to 4 start game/exit cycles i usually have a crash that will lock the X, around the place where one should get the login screen.... here is the dmesg log:


Jun 16 13:25:22 danielleite kernel: [   44.018512] [drm] ib test on ring 2 succeeded in 0 usecs
Jun 16 13:25:22 danielleite kernel: [   44.018526] [drm] ib test on ring 3 succeeded in 0 usecs
Jun 16 13:25:22 danielleite kernel: [   44.018539] [drm] ib test on ring 4 succeeded in 0 usecs
Jun 16 14:13:22 danielleite kernel: [ 2926.554564] BUG: unable to handle kernel paging request at 000000000000140c
Jun 16 14:13:22 danielleite kernel: [ 2926.554589] IP: [<ffffffffc045db9f>] radeon_ttm_tt_create+0xaf/0xe0 [radeon]
Jun 16 14:13:22 danielleite kernel: [ 2926.554628] PGD c61db067 PUD c61e7067 PMD 0 
Jun 16 14:13:22 danielleite kernel: [ 2926.554643] Oops: 0000 [#1] SMP 
Jun 16 14:13:22 danielleite kernel: [ 2926.554653] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables binfmt_misc rfcomm bnep ax88179_178a usbnet arc4 intel_rapl iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel iwlmvm kvm mac80211 iwlwifi joydev serio_raw snd_hda_codec_hdmi rtsx_pci_ms cfg80211 lpc_ich memstick snd_hda_codec_conexant uvcvideo snd_hda_codec_generic videobuf2_vmalloc videobuf2_memops videobuf2_core btusb v4l2_common snd_usb_audio videodev thinkpad_acpi snd_hda_intel bluetooth snd_hda_controller snd_usbmidi_lib media snd_hda_codec nvram snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event mei_me mei shpchp snd_rawmidi snd_seq snd_seq_device snd_timer snd soundcore intel_smartconnect mac_hid parport_pc ppdev lp parport autofs4 btrfs xor raid6_pq dm_crypt hid_generic usbhid hid rtsx_pci_sdmmc crct10dif_pclmul crc32_pclmul ghash_clmulni_intel amdkfd amd_iommu_v2 aesni_intel aes_x86_64 lrw gf128mul radeon glue_helper ablk_helper cryptd i915 psmouse ahci libahci ttm i2c_algo_bit drm_kms_helper drm r8169 rtsx_pci mii video
Jun 16 14:13:22 danielleite kernel: [ 2926.555023] CPU: 3 PID: 2576 Comm: aces Not tainted 4.0.4-040004-generic #201505171336
Jun 16 14:13:22 danielleite kernel: [ 2926.555046] Hardware name: LENOVO 20AYA05KPG/20AYA05KPG, BIOS J3ET59WW (1.59 ) 07/15/2014
Jun 16 14:13:22 danielleite kernel: [ 2926.555068] task: ffff8801fbe6bc00 ti: ffff8800b71a0000 task.ti: ffff8800b71a0000
Jun 16 14:13:22 danielleite kernel: [ 2926.555088] RIP: 0010:[<ffffffffc045db9f>]  [<ffffffffc045db9f>] radeon_ttm_tt_create+0xaf/0xe0 [radeon]
Jun 16 14:13:22 danielleite kernel: [ 2926.555124] RSP: 0018:ffff8800b71a39c8  EFLAGS: 00010202
Jun 16 14:13:22 danielleite kernel: [ 2926.555138] RAX: 000000000000125c RBX: ffff8802205cac00 RCX: 00000000ffffffff
Jun 16 14:13:22 danielleite kernel: [ 2926.555157] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff8802205cac00
Jun 16 14:13:22 danielleite kernel: [ 2926.555177] RBP: ffff8800b71a39f8 R08: 0000000000000001 R09: ffffea0007b2d700
Jun 16 14:13:22 danielleite kernel: [ 2926.555196] R10: ffffffffc04cf7cc R11: ffff8802211a1a38 R12: ffff8800b1cac1f0
Jun 16 14:13:22 danielleite kernel: [ 2926.555215] R13: 00000000ffffffff R14: 0000000000000001 R15: 0000000000000001
Jun 16 14:13:22 danielleite kernel: [ 2926.555234] FS:  00007f7c42151700(0000) GS:ffff88022f2c0000(0000) knlGS:0000000000000000
Jun 16 14:13:22 danielleite kernel: [ 2926.555256] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 16 14:13:22 danielleite kernel: [ 2926.555271] CR2: 000000000000140c CR3: 00000000b1c0b000 CR4: 00000000001407e0
Jun 16 14:13:22 danielleite kernel: [ 2926.555290] Stack:
Jun 16 14:13:22 danielleite kernel: [ 2926.555296]  ffff880000000000 ffff8802205cac00 ffff8800b1cac1f0 ffff8800b71a3c10
Jun 16 14:13:22 danielleite kernel: [ 2926.555318]  ffff8800b71a3c10 7fffffffffffffff ffff8800b71a3a48 ffffffff8155b896
Jun 16 14:13:22 danielleite kernel: [ 2926.555340]  0000000000000000 0100000000000000 0000000000000000 ffff8802205cac00
Jun 16 14:13:22 danielleite kernel: [ 2926.555361] Call Trace:
Jun 16 14:13:22 danielleite kernel: [ 2926.555371]  [<ffffffff8155b896>] fence_wait_timeout.part.10+0x36/0xe0
Jun 16 14:13:22 danielleite kernel: [ 2926.555390]  [<ffffffff8155bc11>] fence_wait_timeout+0x61/0x90
Jun 16 14:13:22 danielleite kernel: [ 2926.555424]  [<ffffffffc052b8cf>] radeon_sync_resv+0x4f/0x120 [radeon]
Jun 16 14:13:22 danielleite kernel: [ 2926.555456]  [<ffffffffc0478438>] radeon_cs_sync_rings+0x58/0x70 [radeon]
Jun 16 14:13:22 danielleite kernel: [ 2926.555486]  [<ffffffffc04787c0>] radeon_cs_ib_vm_chunk+0x100/0x190 [radeon]
Jun 16 14:13:22 danielleite kernel: [ 2926.555517]  [<ffffffffc04794db>] radeon_cs_ioctl+0x1bb/0x200 [radeon]
Jun 16 14:13:22 danielleite kernel: [ 2926.555545]  [<ffffffffc01bfee6>] drm_ioctl+0x2e6/0x590 [drm]
Jun 16 14:13:22 danielleite kernel: [ 2926.555573]  [<ffffffffc0479320>] ? radeon_cs_parser_init+0x400/0x400 [radeon]
Jun 16 14:13:22 danielleite kernel: [ 2926.555600]  [<ffffffffc044034d>] radeon_drm_ioctl+0x5d/0xa0 [radeon]
Jun 16 14:13:22 danielleite kernel: [ 2926.555619]  [<ffffffff8120f5b5>] do_vfs_ioctl+0x75/0x320
Jun 16 14:13:22 danielleite kernel: [ 2926.555637]  [<ffffffff812198f5>] ? __fget_light+0x25/0x70
Jun 16 14:13:22 danielleite kernel: [ 2926.555653]  [<ffffffff8120f8f1>] SyS_ioctl+0x91/0xb0
Jun 16 14:13:22 danielleite kernel: [ 2926.555679]  [<ffffffff817ec827>] ? schedule+0x37/0x90
Jun 16 14:13:22 danielleite kernel: [ 2926.555702]  [<ffffffff817f098d>] system_call_fastpath+0x16/0x1b
Jun 16 14:13:22 danielleite kernel: [ 2926.555719] Code: 85 d2 75 40 48 8b 5d d8 4c 8b 65 e0 4c 8b 6d e8 4c 8b 75 f0 4c 8b 7d f8 c9 c3 66 90 48 8b 87 e0 f8 ff ff 49 89 c8 89 d1 4c 89 f2 <48> 8b 80 b0 01 00 00 48 8b 70 68 e8 71 45 dd ff eb c7 0f 1f 80 
Jun 16 14:13:22 danielleite kernel: [ 2926.555804] RIP  [<ffffffffc045db9f>] radeon_ttm_tt_create+0xaf/0xe0 [radeon]
Jun 16 14:13:22 danielleite kernel: [ 2926.555835]  RSP <ffff8800b71a39c8>
Jun 16 14:13:22 danielleite kernel: [ 2926.555844] CR2: 000000000000140c
Jun 16 14:13:22 danielleite kernel: [ 2926.560619] ---[ end trace 8f6dca67e510f276 ]---
Jun 16 14:15:25 danielleite kernel: [ 3049.355455] sysrq: SysRq : Emergency Sync
Jun 16 14:15:25 danielleite kernel: [ 3049.377271] Emergency Sync complete
Jun 16 14:15:25 danielleite kernel: [ 3049.547586] sysrq: SysRq : Emergency Sync
Jun 16 14:15:25 danielleite kernel: [ 3049.554808] Emergency Sync complete
Jun 16 14:15:25 danielleite kernel: [ 3049.835815] sysrq: SysRq : Emergency Remount R/O

Also, moving the game windows between screens just before the login screen make this lock, i assume it is also related to the crash above


Jun 16 13:24:02 danielleite kernel: [35042.121786] [drm] ib test on ring 4 succeeded in 0 usecs
Jun 16 13:24:05 danielleite kernel: [35045.683053] BUG: unable to handle kernel paging request at 000000232dc12080
Jun 16 13:24:05 danielleite kernel: [35045.683082] IP: [<ffffffffc05a9069>] radeon_fence_signaled+0x49/0x90 [radeon]
Jun 16 13:24:05 danielleite kernel: [35045.683134] PGD 223d16067 PUD 0 
Jun 16 13:24:05 danielleite kernel: [35045.683146] Oops: 0000 [#1] SMP 
Jun 16 13:24:05 danielleite kernel: [35045.683157] Modules linked in: nls_utf8 ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs libcrc32c cpuid xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables binfmt_misc rfcomm bnep ax88179_178a usbnet arc4 intel_rapl iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm iwlmvm mac80211 uvcvideo joydev iwlwifi serio_raw videobuf2_vmalloc videobuf2_memops videobuf2_core snd_hda_codec_hdmi snd_usb_audio lpc_ich v4l2_common snd_hda_codec_conexant snd_hda_codec_generic snd_usbmidi_lib videodev cfg80211 btusb media rtsx_pci_ms snd_hda_intel snd_hda_controller bluetooth memstick snd_hda_codec snd_hwdep snd_pcm mei_me mei shpchp thinkpad_acpi nvram snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device snd_timer snd soundcore intel_smartconnect mac_hid parport_pc ppdev lp parport autofs4 btrfs xor raid6_pq dm_crypt hid_generic usbhid hid rtsx_pci_sdmmc amdkfd crct10dif_pclmul amd_iommu_v2 crc32_pclmul ghash_clmulni_intel radeon aesni_intel i915 aes_x86_64 lrw i2c_algo_bit gf128mul ttm glue_helper ablk_helper drm_kms_helper cryptd drm ahci psmouse libahci r8169 rtsx_pci mii video
Jun 16 13:24:05 danielleite kernel: [35045.683559] CPU: 2 PID: 18267 Comm: aces Not tainted 4.0.4-040004-generic #201505171336
Jun 16 13:24:05 danielleite kernel: [35045.683582] Hardware name: LENOVO 20AYA05KPG/20AYA05KPG, BIOS J3ET59WW (1.59 ) 07/15/2014
Jun 16 13:24:05 danielleite kernel: [35045.683605] task: ffff880223c1b200 ti: ffff880155724000 task.ti: ffff880155724000
Jun 16 13:24:05 danielleite kernel: [35045.683626] RIP: 0010:[<ffffffffc05a9069>]  [<ffffffffc05a9069>] radeon_fence_signaled+0x49/0x90 [radeon]
Jun 16 13:24:05 danielleite kernel: [35045.683669] RSP: 0018:ffff880155727948  EFLAGS: 00010202
Jun 16 13:24:05 danielleite kernel: [35045.683684] RAX: 000000226dbf9020 RBX: ffff880035af8180 RCX: 0000000001312bac
Jun 16 13:24:05 danielleite kernel: [35045.683704] RDX: 000000024b954de0 RSI: 0000000024b954de RDI: 00000000c0018300
Jun 16 13:24:05 danielleite kernel: [35045.683724] RBP: ffff880155727968 R08: 0000000000018440 R09: ffffea0001d35e80
Jun 16 13:24:05 danielleite kernel: [35045.683744] R10: ffffffffc061c7cc R11: ffff8802212e1a38 R12: 0000000000000000
Jun 16 13:24:05 danielleite kernel: [35045.683763] R13: 000000232dc12080 R14: 0000000000000100 R15: ffff880223c1b200
Jun 16 13:24:05 danielleite kernel: [35045.683783] FS:  00007f3773ac7700(0000) GS:ffff88022f280000(0000) knlGS:0000000000000000
Jun 16 13:24:05 danielleite kernel: [35045.683805] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 16 13:24:05 danielleite kernel: [35045.683821] CR2: 000000232dc12080 CR3: 00000002242b6000 CR4: 00000000001407e0
Jun 16 13:24:05 danielleite kernel: [35045.683841] Stack:
Jun 16 13:24:05 danielleite kernel: [35045.683847]  ffff880155727968 ffff880074d7a600 ffff880203f7b300 ffff8802212e1a38
Jun 16 13:24:05 danielleite kernel: [35045.683870]  ffff880155727998 ffffffffc061c854 ffff880155727bd0 ffff8802212e1990
Jun 16 13:24:05 danielleite kernel: [35045.683892]  ffff880155727bd0 0000000000004e20 ffff880155727a98 ffffffffc061cf68
Jun 16 13:24:05 danielleite kernel: [35045.683914] Call Trace:
Jun 16 13:24:05 danielleite kernel: [35045.683947]  [<ffffffffc061c854>] radeon_sa_bo_try_free+0x64/0x80 [radeon]
Jun 16 13:24:05 danielleite kernel: [35045.683984]  [<ffffffffc061cf68>] radeon_sa_bo_new+0xf8/0x3b0 [radeon]
Jun 16 13:24:05 danielleite kernel: [35045.684016]  [<ffffffffc05c50b0>] ? radeon_irq_kms_disable_hpd+0xb0/0xb0 [radeon]
Jun 16 13:24:05 danielleite kernel: [35045.684056]  [<ffffffffc06782a2>] radeon_ib_get+0x42/0xe0 [radeon]
Jun 16 13:24:05 danielleite kernel: [35045.684087]  [<ffffffffc05c5245>] radeon_cs_ib_fill+0x85/0x220 [radeon]
Jun 16 13:24:05 danielleite kernel: [35045.684116]  [<ffffffffc05c642b>] radeon_cs_ioctl+0x10b/0x200 [radeon]
Jun 16 13:24:05 danielleite kernel: [35045.684153]  [<ffffffffc02f0ee6>] drm_ioctl+0x2e6/0x590 [drm]
Jun 16 13:24:05 danielleite kernel: [35045.684184]  [<ffffffffc05c6320>] ? radeon_cs_parser_init+0x400/0x400 [radeon]
Jun 16 13:24:05 danielleite kernel: [35045.684218]  [<ffffffffc058d34d>] radeon_drm_ioctl+0x5d/0xa0 [radeon]
Jun 16 13:24:05 danielleite kernel: [35045.684240]  [<ffffffff8120f5b5>] do_vfs_ioctl+0x75/0x320
Jun 16 13:24:05 danielleite kernel: [35045.684256]  [<ffffffff81096af9>] ? task_work_run+0xd9/0xf0
Jun 16 13:24:05 danielleite kernel: [35045.684275]  [<ffffffff812198f5>] ? __fget_light+0x25/0x70
Jun 16 13:24:05 danielleite kernel: [35045.684291]  [<ffffffff8120f8f1>] SyS_ioctl+0x91/0xb0
Jun 16 13:24:05 danielleite kernel: [35045.684308]  [<ffffffff817f098d>] system_call_fastpath+0x16/0x1b
Jun 16 13:24:05 danielleite kernel: [35045.684325] Code: 89 fb 4c 89 6d f8 74 39 8b 77 68 4c 8b 67 60 48 8b 7f 58 89 f0 48 89 c2 48 c1 e0 08 48 c1 e2 04 48 29 d0 4c 8d ac 07 60 0d 00 00 <49> 8b 45 00 49 39 c4 77 1e 48 89 df e8 06 2a fb c0 b8 01 00 00 
Jun 16 13:24:05 danielleite kernel: [35045.684416] RIP  [<ffffffffc05a9069>] radeon_fence_signaled+0x49/0x90 [radeon]
Jun 16 13:24:05 danielleite kernel: [35045.684446]  RSP <ffff880155727948>
Jun 16 13:24:05 danielleite kernel: [35045.684456] CR2: 000000232dc12080
Jun 16 13:24:05 danielleite kernel: [35045.689497] ---[ end trace 95be8a723cce70a1 ]---
Jun 16 13:24:27 danielleite kernel: [35068.016620] sysrq: SysRq : Emergency Sync
Jun 16 13:24:28 danielleite kernel: [35068.134038] Emergency Sync complete
Jun 16 13:24:28 danielleite kernel: [35068.304852] sysrq: SysRq : Emergency Remount R/O


i can almost reproduce this in a few tries and i will try to update to kernel 4.1 when it is released
Comment 1 higuita 2015-09-02 16:48:12 UTC
The problem disappeared with recent kernel or mesa updates, so closing