Bug 49531

Summary: Powering down inactive GPU while running X causes NULL pointer dereference
Product: Drivers Reporter: Igor Murzov (e-mail)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: NEW ---    
Severity: normal CC: alexdeucher, matttbe
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.7.0-rc2+ Subsystem:
Regression: No Bisected commit-id:
Attachments: /var/log/syslog
lspci -vvv
Call stack information from GDB
full dmesg output for v3.7.0-rc2+
Kernel Oops with version 3.8-rc7

Description Igor Murzov 2012-10-25 18:44:55 UTC
Created attachment 84841 [details]
/var/log/syslog

On my muxed hybrid system (Lenovo IdeaPad U455) vga_switcheroo works fine if X is not running. But if both GPUs are powered and X is running, then when i try to power down inactive GPU with

 # echo OFF > /sys/kernel/debug/vgaswitcheroo/switch

kernel crashes after a couple of minutes.
There is an error message in the /var/log/syslog:

 BUG: unable to handle kernel NULL pointer dereference at           (null)
 IP: [<ffffffffa046812b>] r600_pcie_gart_tlb_flush+0xeb/0x110 [radeon]
 PGD 8d24c067 PUD 8d24b067 PMD 0 
 Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
 Modules linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss ipv6 pcmcia pcmcia_core xfs exportfs ext2 mbcache cpufreq_ondemand mperf lp parport_pc parport fuse uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev media btusb bluetooth joydev brcmsmac radeon cordic brcmutil mac80211 ttm drm_kms_helper snd_hda_codec_hdmi snd_hda_codec_conexant drm cfg80211 snd_hda_intel agpgart snd_hda_codec snd_hwdep sp5100_tco ohci_hcd i2c_piix4 powernow_k8 ideapad_laptop i2c_algo_bit psmouse sparse_keymap snd_pcm ehci_hcd i2c_core processor freq_table rfkill thermal bcma rtc_cmos atl1c snd_page_alloc serio_raw k8temp thermal_sys snd_timer shpchp button battery snd soundcore hwmon evdev ac loop btrfs [last unloaded: pcmcia_core]
 CPU 0 
 Pid: 2291, comm: X Not tainted 3.7.0-rc2+ #70 LENOVO 20046                           /AMD CRB
 RIP: 0010:[<ffffffffa046812b>]  [<ffffffffa046812b>] r600_pcie_gart_tlb_flush+0xeb/0x110 [radeon]
 RSP: 0018:ffff8800894d1918  EFLAGS: 00010282
 RAX: ffffc90001322f34 RBX: 0000000000000000 RCX: 00000000aadfb000
 RDX: 0000000000000000 RSI: 0000000000002f34 RDI: ffff8800aad18000
 RBP: ffff8800894d1928 R08: 00000000dfffffff R09: 00000000000000d0
 R10: 0000000000031cd5 R11: 0000000000000000 R12: ffff8800aad18000
 R13: 0000000000000219 R14: 0000000000000219 R15: 0000000000000000
 FS:  00007f8819d618c0(0000) GS:ffff8800afa00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 000000008d18d000 CR4: 00000000000007f0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
 Process X (pid: 2291, threadinfo ffff8800894d0000, task ffff88008d128000)
 Stack:
  ffff8800aad18000 0000000000000219 ffff8800894d1958 ffffffffa043d365
  ffff8800896f1580 ffff8800894d1bc8 0000000000000000 ffff8800894d1bc8
  ffff8800894d1968 ffffffffa043a98a ffff8800894d1988 ffffffffa01ef097
 Call Trace:
  [<ffffffffa043d365>] radeon_gart_unbind+0xa5/0xe0 [radeon]
  [<ffffffffa043a98a>] radeon_ttm_backend_unbind+0x1a/0x20 [radeon]
  [<ffffffffa01ef097>] ttm_tt_unbind+0x27/0x40 [ttm]
  [<ffffffffa01f36e6>] ttm_bo_move_memcpy+0x266/0x580 [ttm]
  [<ffffffffa043b693>] radeon_bo_move+0xd3/0x180 [radeon]
  [<ffffffffa01f0f46>] ttm_bo_handle_move_mem+0x216/0x3f0 [ttm]
  [<ffffffffa01f1d10>] ? ttm_bo_mem_space+0x180/0x360 [ttm]
  [<ffffffffa01f2032>] ttm_bo_move_buffer+0x142/0x150 [ttm]
  [<ffffffffa01f20dc>] ttm_bo_validate+0x9c/0x120 [ttm]
  [<ffffffffa043c24a>] radeon_bo_pin_restricted+0x10a/0x1b0 [radeon]
  [<ffffffffa0449eab>] radeon_crtc_cursor_set+0x9b/0x4a0 [radeon]
  [<ffffffffa037e788>] ? drm_mode_object_find+0x68/0x90 [drm]
  [<ffffffffa0382425>] drm_mode_cursor_ioctl+0x105/0x160 [drm]
  [<ffffffffa0372443>] drm_ioctl+0x4c3/0x570 [drm]
  [<ffffffffa0382320>] ? drm_mode_setcrtc+0x5a0/0x5a0 [drm]
  [<ffffffff8154b07c>] ? __do_page_fault+0x25c/0x4b0
  [<ffffffff8115e7e0>] do_vfs_ioctl+0x90/0x570
  [<ffffffff8114d71b>] ? rw_verify_area+0x4b/0xf0
  [<ffffffff81057d67>] ? __set_task_blocked+0x37/0x80
  [<ffffffff8154748e>] ? _raw_spin_unlock_irq+0xe/0x20
  [<ffffffff8105a322>] ? __set_current_blocked+0x52/0x60
  [<ffffffff8115ed51>] sys_ioctl+0x91/0xb0
  [<ffffffff8154f899>] system_call_fastpath+0x16/0x1b
 Code: c1 e8 04 83 f8 02 74 2a 85 c0 74 cc 5b 41 5c 5d c3 0f 1f 80 00 00 00 00 31 d2 be 34 2f 00 00 48 8b 9f 98 03 00 00 e8 e5 37 ff ff <8b> 03 e9 4a ff ff ff 48 c7 c7 f0 9f 4b a0 31 c0 e8 f3 5d 0d e1 
 RIP  [<ffffffffa046812b>] r600_pcie_gart_tlb_flush+0xeb/0x110 [radeon]
  RSP <ffff8800894d1918>
 CR2: 0000000000000000
 ---[ end trace 4363d7f2115bb109 ]---
Comment 1 Igor Murzov 2012-10-25 18:45:46 UTC
Created attachment 84851 [details]
lspci -vvv
Comment 2 Igor Murzov 2012-10-25 19:29:01 UTC
Created attachment 84861 [details]
Call stack information from GDB
Comment 3 Alex Deucher 2012-10-25 19:33:44 UTC
Are you using the new dynamic gpu switching stuff in xserver 1.13 or just the old static switching?
Comment 4 Igor Murzov 2012-10-25 19:58:36 UTC
I use xorg-server-1.12.3 and i'm not switching GPUs while X is running. I've tried to power down unused GPU and then kernel crashed. Switching GPUs while X is running doesn't work:

[ 6514.980755] vga_switcheroo: client 0 refused switch
Comment 5 Michel Dänzer 2012-10-26 08:21:53 UTC
Please attach the full dmesg output showing the drm/radeon initialization messages.
Comment 6 Igor Murzov 2012-10-26 12:22:10 UTC
Created attachment 84961 [details]
full dmesg output for v3.7.0-rc2+

The kernel is not from the origin/master, it's the drm-fixes-3.7 from the git://people.freedesktop.org/~agd5f/linux.
Comment 7 Matthieu Baerts 2013-02-17 13:07:48 UTC
Created attachment 93461 [details]
Kernel Oops with version 3.8-rc7

Hello,

I also have this crash with the version 3.8-rc7 on Ubuntu Raring 13.04.

I added this line in my /etc/rc.local file:
   echo OFF > /sys/kernel/debug/vgaswitcheroo/switch

I don't have this crash each time at startup but I guess it depends if the previous command is launched before or after that X11's launch.

I'm using this card: Mobility Radeon HD 4650

I'm attaching a file with lines from /var/log/kern.log