Bug 83611

Summary: Kernel NULL pointer dereference when using tlp on a laptop with AMD video card.
Product: Drivers Reporter: yshuiv7
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: NEW ---    
Severity: high CC: alexdeucher, frederik
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 3.16.1 Subsystem:
Regression: No Bisected commit-id:
Attachments: tlp udev rule file
possible fix

Description yshuiv7 2014-08-31 19:36:15 UTC
relevant dmesg:

ACPI: \_SB_.PCI0.LPCB.EC0_.ECRD: 1 arguments were passed to a non-method ACPI object (RegionField) (20140424/nsarguments-230)
ACPI: \_SB_.PCI0.LPCB.EC0_.ECRD: 1 arguments were passed to a non-method ACPI object (RegionField) (20140424/nsarguments-230)
input: ThinkPad Extra Buttons as /devices/platform/thinkpad_acpi/input/input14
ATOM BIOS: Lenovo/Compal
[drm] GPU not posted. posting now...
radeon 0000:05:00.0: VRAM: 2048M 0x0000000000000000 - 0x000000007FFFFFFF (2048M used)
radeon 0000:05:00.0: GTT: 1024M 0x0000000080000000 - 0x00000000BFFFFFFF
[drm] Detected VRAM RAM=2048M, BAR=256M
[drm] RAM width 128bits DDR
[TTM] Zone  kernel: Available graphics memory: 4049972 kiB
[TTM] Zone   dma32: Available graphics memory: 2097152 kiB
[TTM] Initializing pool allocator
[TTM] Initializing DMA pool allocator
[drm] radeon: 2048M of VRAM memory ready
[drm] radeon: 1024M of GTT memory ready.
[drm] Loading VERDE Microcode
[drm] radeon/VERDE_mc2.bin: 31500 bytes
[drm] Internal thermal controller without fan control
[drm] probing gen 2 caps for device 8086:9c18 = 5323c42/0
[drm] radeon: dpm initialized
[drm] GART: num cpu pages 262144, num gpu pages 262144
[drm] probing gen 2 caps for device 8086:9c18 = 5323c42/0
[drm] PCIE gen 2 link speeds already enabled
[drm] PCIE GART of 1024M enabled (table at 0x0000000000276000).
radeon 0000:05:00.0: WB enabled
radeon 0000:05:00.0: fence driver on ring 0 use gpu addr 0x0000000080000c00 and cpu addr 0xffff88024d04fc00
radeon 0000:05:00.0: fence driver on ring 1 use gpu addr 0x0000000080000c04 and cpu addr 0xffff88024d04fc04
radeon 0000:05:00.0: fence driver on ring 2 use gpu addr 0x0000000080000c08 and cpu addr 0xffff88024d04fc08
radeon 0000:05:00.0: fence driver on ring 3 use gpu addr 0x0000000080000c0c and cpu addr 0xffff88024d04fc0c
radeon 0000:05:00.0: fence driver on ring 4 use gpu addr 0x0000000080000c10 and cpu addr 0xffff88024d04fc10
radeon 0000:05:00.0: fence driver on ring 5 use gpu addr 0x0000000000075a18 and cpu addr 0xffffc900121b5a18
[drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[drm] Driver supports precise vblank timestamp query.
radeon 0000:05:00.0: irq 68 for MSI/MSI-X
radeon 0000:05:00.0: radeon: using MSI.
[drm] radeon: irq initialized.
32]: wlp4s0: soliciting a DHCP lease
BUG: unable to handle kernel NULL pointer dereference at 0000000000000080
IP: [<ffffffffa060aefb>] dce6_bandwidth_update+0x4b/0x110 [radeon]
PGD 252511067 PUD 24ee99067 PMD 0 
Oops: 0000 [#1] PREEMPT SMP 
Modules linked in: ses enclosure ecb btusb uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core v4l2_common uas bluetooth videodev thinkpad_acpi nvram media usb_storage 6lowpan_iphc crc16 intel_rapl joydev mousedev x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd arc4 iTCO_wdt iwlmvm mac80211 iTCO_vendor_support i915(+) radeon(+) mac_hid microcode serio_raw psmouse evdev pcspkr ttm iwlwifi dw_dmac dw_dmac_core drm_kms_helper thermal battery cfg80211 gpio_lynxpoint drm ideapad_laptop r8169 rtsx_pci_ms memstick sparse_keymap rfkill mii fan 8250_dw video mei_me mei ac i2c_hid spi_pxa2xx_platform i2c_designware_platform intel_gtt i2c_i801 i2c_algo_bit i2c_designware_core
 i2c_core button shpchp lpc_ich snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm snd_timer snd soundcore processor coretemp hwmon acpi_call(O) btrfs xor hid_generic usbhid hid raid6_pq sd_mod crc_t10dif crct10dif_common rtsx_pci_sdmmc atkbd libps2 ahci libahci crc32c_intel libata scsi_mod ehci_pci ehci_hcd xhci_hcd usbcore rtsx_pci usb_common i8042 serio sdhci_acpi sdhci led_class mmc_core
CPU: 0 PID: 407 Comm: tlp Tainted: G           O  3.16.1-1-ARCH #1
Hardware name: LENOVO 20347/Lenovo Y40-70, BIOS 99CN24WW(V1.07) 07/28/2014
task: ffff88024d1028c0 ti: ffff88024ecb8000 task.ti: ffff88024ecb8000
RIP: 0010:[<ffffffffa060aefb>]  [<ffffffffa060aefb>] dce6_bandwidth_update+0x4b/0x110 [radeon]
RSP: 0018:ffff88024ecbbdf0  EFLAGS: 00010246
RAX: ffff88024ef68498 RBX: ffff88024ef68000 RCX: ffff88024ef684c8
RDX: 0000000000000000 RSI: 00000000000004b0 RDI: ffff88024ef68000
RBP: ffff88024ecbbe20 R08: ffff88024d2699e6 R09: 00000000000004b0
R10: 0000000000000005 R11: ffffea00026a1500 R12: ffff88024ef68000
R13: 0000000000000000 R14: ffff88024ef69738 R15: ffff88024ef69048
FS:  00007f869e36c700(0000) GS:ffff88025f200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000080 CR3: 000000025256b000 CR4: 00000000001407f0
Stack:
 000000008830c49b ffff88024ef68000 ffff88024ef69710 0000000000000000
 ffff88024ef69738 ffff88024ef69048 ffff88024ecbbe60 ffffffffa05e824e
 ffff88024ecbbf08 ffff88024ef68000 000000000000000c ffff88009acd44b0
Call Trace:
 [<ffffffffa05e824e>] radeon_pm_compute_clocks+0x62e/0x8f0 [radeon]
 [<ffffffffa05e8ad7>] radeon_set_dpm_state+0x87/0x100 [radeon]
 [<ffffffff8139f978>] dev_attr_store+0x18/0x30
 [<ffffffff812384da>] sysfs_kf_write+0x3a/0x50
 [<ffffffff81237a4e>] kernfs_fop_write+0xee/0x180
 [<ffffffff811c1f87>] vfs_write+0xb7/0x200
 [<ffffffff811c2bf9>] SyS_write+0x59/0xd0
 [<ffffffff811df9c2>] ? __close_fd+0x82/0xa0
 [<ffffffff81530be9>] system_call_fastpath+0x16/0x1b
Code: 94 24 68 20 00 00 85 d2 0f 8e d1 00 00 00 83 ea 01 49 8d 84 24 98 04 00 00 45 31 ed 49 8d 8c d4 a0 04 00 00 0f 1f 40 00 48 8b 10 <80> ba 80 00 00 00 01 41 83 dd ff 48 83 c0 08 48 39 c8 75 e9 49 
RIP  [<ffffffffa060aefb>] dce6_bandwidth_update+0x4b/0x110 [radeon]
 RSP <ffff88024ecbbdf0>
CR2: 0000000000000080
---[ end trace 7264449ae2a0d879 ]---
Comment 1 yshuiv7 2014-08-31 19:44:43 UTC
I think there is probably a race condition here. When tlp is automatically started via udev with the rule file attached (40-tlp.rules), this bug occurs. But when I remove the rule file, and start tlp after the system is fully booted, everything is fine.
Comment 2 yshuiv7 2014-08-31 19:53:31 UTC
Created attachment 148941 [details]
tlp udev rule file
Comment 3 yshuiv7 2014-08-31 21:18:47 UTC
Maybe we could acquire the pm mutex while setting up crtcs in radeon_modeset_init?
Comment 4 frederik 2014-11-03 10:00:16 UTC
I experienced the same bug with 3.17.2, triggered by laptop-mode-tools: https://bugs.freedesktop.org/show_bug.cgi?id=85771
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=767742
Comment 5 Alex Deucher 2014-11-03 15:03:05 UTC
Created attachment 156391 [details]
possible fix

Does this patch help?