Bug 90851 - radeonsi on pitcairn: nine and skyrim - unable to handle kernel paging request
Summary: radeonsi on pitcairn: nine and skyrim - unable to handle kernel paging request
Status: RESOLVED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_video-dri
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-01-06 11:10 UTC by Christoph Haag
Modified: 2015-02-20 12:01 UTC (History)
0 users

See Also:
Kernel Version: 3.19-rc2
Subsystem:
Regression: No
Bisected commit-id:


Attachments
full dmesg (106.28 KB, text/plain)
2015-01-06 11:10 UTC, Christoph Haag
Details

Description Christoph Haag 2015-01-06 11:10:37 UTC
Created attachment 162571 [details]
full dmesg

It does not happen very often, but when it does, the game freezes and has to be killed. radeon takes a little wile, but recovers.

00:02.0 VGA compatible controller: Intel Corporation 3rd Gen Core processor Graphics Controller (rev 09)
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Wimbledon XT [Radeon HD 7970M] (rev ff)

mesa was a then recent git master build with master from https://github.com/iXit/Mesa-3D.git merged into it so there's a possibility it doesn't happen with pure upstream mesa.

I think it has only happened in skyrim with nine so far.

[106308.053804] BUG: unable to handle kernel paging request at ffff8004a2fa79e8
[106308.055548] IP: [<ffffffffa01c03ee>] ttm_eu_reserve_buffers+0xbe/0x390 [ttm]
[106308.057054] PGD 0 
[106308.057059] Oops: 0000 [#1] PREEMPT SMP 
[106308.057088] Modules linked in: hidp uvcvideo rfcomm joydev btrfs xor ecb bnep msr videobuf2_vmalloc videobuf2_memops videobuf2_core v4l2_common raid6_pq videodev media coretemp btusb arc4 mousedev intel_rapl bluetooth iosf_mbi x86_pkg_temp_thermal intel_powerclamp kvm_intel iTCO_wdt iTCO_vendor_support snd_hda_codec_hdmi kvm snd_hda_codec_realtek iwldvm snd_hda_codec_generic led_class mac80211 crct10dif_pclmul crc32_pclmul crc32c_intel snd_hda_intel ghash_clmulni_intel snd_hda_controller aesni_intel snd_hda_codec aes_x86_64 lrw gf128mul glue_helper snd_hwdep iwlwifi snd_pcm ablk_helper cfg80211 snd_timer psmouse cryptd r8169 i2c_i801 serio_raw snd pcspkr rtsx_pci_ms soundcore memstick rfkill lpc_ich mii wmi tpm_tis tpm mei_me mei shpchp evdev battery ac thermal processor mac_hid sch_fq_codel nfs
[106308.057104]  lockd grace sunrpc fscache fuse ext4 crc16 mbcache jbd2 sr_mod cdrom sd_mod hid_generic usbhid hid rtsx_pci_sdmmc mmc_core atkbd libps2 ahci ehci_pci libahci libata xhci_pci xhci_hcd firewire_ohci ehci_hcd scsi_mod firewire_core crc_itu_t rtsx_pci usbcore usb_common i8042 serio radeon hwmon ttm i915 button intel_gtt video i2c_algo_bit drm_kms_helper drm i2c_core [last unloaded: uvcvideo]
[106308.057107] CPU: 2 PID: 3057 Comm: TESV.exe Not tainted 3.19.0-1-mainline #1
[106308.057108] Hardware name: CLEVO                             P170EM/P170EM, BIOS 4.6.5 08/22/2012
[106308.057109] task: ffff88070cae13e0 ti: ffff8804a30dc000 task.ti: ffff8804a30dc000
[106308.057115] RIP: 0010:[<ffffffffa01c03ee>]  [<ffffffffa01c03ee>] ttm_eu_reserve_buffers+0xbe/0x390 [ttm]
[106308.057116] RSP: 0018:ffff8804a30df738  EFLAGS: 00010286
[106308.057117] RAX: 0000000000000000 RBX: ffff8804a30dfb30 RCX: 0000000000000008
[106308.057117] RDX: 0000000000000004 RSI: 0000000000000058 RDI: 0000000000000000
[106308.057118] RBP: ffff8804a30df788 R08: ffffc9002295b488 R09: 0000000000000000
[106308.057118] R10: ffffffff817215ea R11: ffffea0013907a00 R12: 0000000000000000
[106308.057119] R13: ffff8004a2fa7868 R14: ffff880808e08000 R15: ffff88031f24c388
[106308.057120] FS:  000000007ffd8000(0063) GS:ffff88082f280000(006b) knlGS:00000000eb4adb40
[106308.057121] CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
[106308.057121] CR2: ffff8004a2fa79e8 CR3: 000000070d02d000 CR4: 00000000001407e0
[106308.057122] Stack:
[106308.057124]  ffff880809282d80 ffff8804a30df7e0 0100000000000000 ffff8804a30dfcd8
[106308.057125]  ffff880337cba800 ffff8804a30dfb30 ffff8804a30dfa80 ffff8804a30dfb30
[106308.057127]  ffff880808e08000 ffff8804a30dfae8 ffff8804a30df828 ffffffffa01eeaa7
[106308.057127] Call Trace:
[106308.057144]  [<ffffffffa01eeaa7>] radeon_bo_list_validate+0x97/0x230 [radeon]
[106308.057158]  [<ffffffffa0206f9d>] radeon_cs_parser_relocs+0x34d/0x440 [radeon]
[106308.057172]  [<ffffffffa0207ac0>] radeon_cs_ioctl+0x2a0/0x810 [radeon]
[106308.057176]  [<ffffffff81014935>] ? __switch_to+0x445/0x5f0
[106308.057186]  [<ffffffffa001ccff>] drm_ioctl+0x1df/0x680 [drm]
[106308.057197]  [<ffffffffa01cd04c>] radeon_drm_ioctl+0x4c/0x80 [radeon]
[106308.057208]  [<ffffffffa02cf5d4>] radeon_kms_compat_ioctl+0x14/0x30 [radeon]
[106308.057211]  [<ffffffff812276c0>] compat_SyS_ioctl+0xf0/0x1260
[106308.057214]  [<ffffffff810f2fe4>] ? compat_SyS_futex+0x84/0x1a0
[106308.057216]  [<ffffffff81091339>] ? task_work_run+0xd9/0xf0
[106308.057220]  [<ffffffff815639b6>] sysenter_dispatch+0x7/0x25
[106308.057234] Code: 00 00 00 85 c0 0f 8f d2 01 00 00 41 80 7f 18 00 0f 85 a7 01 00 00 4d 8b 3f 49 39 df 0f 84 eb 01 00 00 48 83 7d c8 00 4d 8b 6f 10 <49> 8b bd 80 01 00 00 0f 84 65 01 00 00 80 7d c7 00 48 8b 75 c8 
[106308.057238] RIP  [<ffffffffa01c03ee>] ttm_eu_reserve_buffers+0xbe/0x390 [ttm]
[106308.057239]  RSP <ffff8804a30df738>
[106308.057239] CR2: ffff8004a2fa79e8
[106308.069117] ---[ end trace 086e470f5f9bd070 ]---
Comment 1 Michel Dänzer 2015-01-07 07:05:05 UTC
Can you try decoding the backtrace with scripts/decode_stacktrace.sh from the kernel tree?

Does it only happen with a 3.19 kernel, or also with older ones?
Comment 2 Christoph Haag 2015-01-08 08:29:46 UTC
I'll have to build a kernel with symbols later and replicate it. It sometimes takes even a few hours of gameplay to have this happen, so it could take some time.

But I am relatively sure that it did not happen with 3.18.
Comment 3 Christoph Haag 2015-01-12 16:40:11 UTC
Hm, interesting. I compiled 3.19-rc4 with debug symbols.

I'm also testing Tom Stellard's VGPR register spilling llvm and mesa branches.

After a while of playing skyrim I got the familiar hang where skyrim just freezes, but I did NOT get "BUG: unable to handle kernel paging request" in the system log.

Instead I got this in the terminal from which I started skyrim:

radeon: mmap failed, errno: 12
radeon: mmap failed, errno: 12
radeon: mmap failed, errno: 12
radeon: mmap failed, errno: 12

I'm not very good with the wine debugger... Attaching to the TESV.exe process and then getting a backtrace shows:

Wine-dbg>bt
Backtrace:
=>0 0xf7702bee __kernel_vsyscall+0xe() in [vdso].so (0x7eada510)
  1 0xf7514e02 __lll_lock_wait+0x21() in libpthread.so.0 (0x7eada510)
  2 0xf750f5ae __GI___pthread_mutex_lock+0x8d() in libpthread.so.0 (0x7eada510)
  3 0xed73ba1d in d3dadapter9.so.1 (+0x128a1c) (0x7eada510)
  4 0x0069df9c in tesv (+0x29df9b) (0x7eada510)
  5 0xfff0e400 (0x526077e9)

I would try to find out where exactly in d3dadapter9.so.1 this happens but I don't get how to properly attach winedbg --gdb and addr2linux didn't give a line with code, so I probably used it wrong.
Comment 4 Michel Dänzer 2015-01-13 10:04:04 UTC
(In reply to Christoph Haag from comment #3)
> radeon: mmap failed, errno: 12

That's ENOMEM, so it looks like the kernel runs out of memory. Maybe a leak somewhere.

(In reply to Christoph Haag from comment #2)
> But I am relatively sure that it did not happen with 3.18.

Can you bisect?
Comment 5 Christoph Haag 2015-02-20 12:01:32 UTC
I didn't answer for a while because I didn't have too much time, but also because it hasn't happened anymore.
I think it has meanwhile been fixed, wherever the problem was.

Note You need to log in before you can comment on or make changes to this bug.