Bug 50241

Summary: ttm_get_pages() will OOPS with highmem allocation
Product: Drivers Reporter: Jonathan Morton (jonathan.morton)
Component: Video(DRI - Intel)Assignee: drivers_video-dri-intel (drivers_video-dri-intel)
Status: RESOLVED CODE_FIX    
Severity: normal CC: alan, daniel, florian
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.6 Subsystem:
Regression: No Bisected commit-id:
Attachments: Small patch to ttm_get_pages()

Description Jonathan Morton 2012-11-07 18:29:55 UTC
Created attachment 85871 [details]
Small patch to ttm_get_pages()

I'm developing a graphics driver that uses the TTM subsystem to allocate hardware-visible buffers.  On an IA32 machine with at least 1GB RAM, TTM allocation fails with a null dereference OOPS below ttm_get_pages() with high reliability.

I was given the attached patch along with the initial version of the driver, and it appears to solve the problem.  I notice that despite the age of the patch, it is not present in the current mainline kernels or a major distribution (Debian) kernel.

[261231.137995] BUG: unable to handle kernel NULL pointer dereference at   (null)
[261231.138029] IP: [<f85159f1>] ttm_get_pages+0x289/0x2e3 [ttm]
[261231.138085] *pdpt = 0000000033d30001 *pde = 0000000000000000 
[261231.138109] Oops: 0002 [#1] SMP 
[261231.138127] Modules linked in: nls_utf8 nls_cp437 vfat fat parport_pc ppdev bnep rfcomm bluetooth lp parport uinput nfsd nfs nfs_acl auth_rpcgss fscache lockd sunrpc loop joydev arc4 cedarview_gfx(O) snd_hda_codec_realtek rtl8192ce snd_hda_intel rtlwifi snd_hda_codec rtl8192c_common snd_hwdep mac80211 snd_pcm uvcvideo classmate_laptop snd_page_alloc tpm_tis iTCO_wdt coretemp tpm psmouse pcspkr serio_raw snd_seq snd_seq_device snd_timer ttm cfg80211 videodev acpi_cpufreq drm_kms_helper media i2c_i801 iTCO_vendor_support evdev drm rfkill mperf snd tpm_bios battery i2c_algo_bit i2c_core soundcore video ac power_supply button processor usbhid hid ums_realtek ext4 crc16 jbd2 mbcache sg sd_mod crc_t10dif usb_storage uas ahci libahci uhci_hcd libata scsi_mod ehci_hcd usbcore usb_common r8169 mii thermal fan thermal_sys [last unloaded: scsi_wait_scan]
[261231.138500] 
[261231.138518] Pid: 18234, comm: mplayer Tainted: G           O 3.2.0-3-686-pae #1 Intel Corporation Intel powered classmate PC/Intel powered classmate PC
[261231.138551] EIP: 0060:[<f85159f1>] EFLAGS: 00210246 CPU: 0
[261231.138586] EIP is at ttm_get_pages+0x289/0x2e3 [ttm]
[261231.138602] EAX: 00000000 EBX: f6bf23e0 ECX: 00000400 EDX: c14ee840
[261231.138617] ESI: f7093d04 EDI: 00000000 EBP: 00000000 ESP: f7093cb0
[261231.138633]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[261231.138652] Process mplayer (pid: 18234, ti=f7092000 task=f5887520 task.ti=f7092000)
[261231.138664] Stack:
[261231.138674]  f62e3a20 f58d1000 000280d0 00000001 f5802440 00000040 00200282 00000040
[261231.138723]  000200d2 f5800140 f7139ac0 00000000 f52f9400 f7139ac0 00000000 f8510a8e
[261231.138767]  00000001 f58d1800 f728e640 00000000 00000000 f6bf23f4 f6bf23f4 00000000
[261231.138812] Call Trace:
[261231.138855]  [<f8510a8e>] ? __ttm_tt_get_page+0x4e/0xd8 [ttm]
[261231.138897]  [<f8510e28>] ? ttm_tt_populate+0x2e/0x5f [ttm]
[261231.138935]  [<f8510e7c>] ? ttm_tt_bind+0x23/0x4f [ttm]
[261231.138975]  [<f8512013>] ? ttm_bo_handle_move_mem+0xf1/0x241 [ttm]
[261231.139016]  [<f8512ba2>] ? ttm_bo_move_buffer+0xbe/0xe3 [ttm]
[261231.139047]  [<c10c0bb2>] ? kmem_cache_alloc_trace+0x69/0x73
[261231.139089]  [<f8512c77>] ? ttm_bo_validate+0xb0/0xf3 [ttm]
[261231.139130]  [<f8512f3c>] ? ttm_bo_init+0x282/0x2b1 [ttm]
[261231.139242]  [<f913095d>] ? ttm_pl_create_ioctl+0x118/0x1ab [cedarview_gfx]
[261231.139353]  [<f9130607>] ? ttm_pl_fill_rep+0x43/0x43 [cedarview_gfx]
[261231.139385]  [<c1028d9b>] ? __kunmap_atomic+0x62/0x6f
[261231.139494]  [<f912f7ec>] ? psb_pl_create_ioctl+0x24/0x28 [cedarview_gfx]
[261231.139552]  [<f848fe17>] ? drm_ioctl+0x256/0x2f6 [drm]
[261231.139664]  [<f912f7c8>] ? psb_pl_reference_ioctl+0xe/0xe [cedarview_gfx]
[261231.139691]  [<c10246dd>] ? arch_flush_lazy_mmu_mode+0x5/0x14
[261231.139714]  [<c1028e85>] ? kmap_atomic_prot+0xcc/0xe0
[261231.139737]  [<c10ad66d>] ? handle_mm_fault+0x1ee/0x1fd
[261231.139793]  [<f848fbc1>] ? drm_copy_field+0x47/0x47 [drm]
[261231.139816]  [<c10d777f>] ? do_vfs_ioctl+0x459/0x48f
[261231.139840]  [<c12c287f>] ? do_page_fault+0x2e0/0x2fc
[261231.139860]  [<c12c286c>] ? do_page_fault+0x2cd/0x2fc
[261231.139883]  [<c1210c8e>] ? sys_recv+0x19/0x1d
[261231.139903]  [<c121128c>] ? sys_socketcall+0x11f/0x1da
[261231.139923]  [<c10d77f9>] ? sys_ioctl+0x44/0x67
[261231.139946]  [<c12c419f>] ? sysenter_do_call+0x12/0x28
[261231.139959] Code: 29 7b 10 31 ed 8b 54 24 18 89 d8 e8 98 9f da c8 83 7c 24 1c 00 74 23 8b 1e eb 15 89 d8 e8 e4 40 b9 c8 b9 00 04 00 00 89 c7 31 c0 <f3> ab 8b 5b 14 83 eb 14 8d 43 14 39 c6 75 e1 31 db 85 ed 74 3b 
[261231.140206] EIP: [<f85159f1>] ttm_get_pages+0x289/0x2e3 [ttm] SS:ESP 0068:f7093cb0
[261231.140252] CR2: 0000000000000000
[261231.140353] ---[ end trace 41883c8094a775b9 ]---
Comment 1 Alan 2012-11-08 14:13:29 UTC
 cedarview_gfx(O)

looks to me like the closed user space Intel graphics driver for Cedartrail devices ?

The patch was it seems only ever posted to Meego specific lists so no surprise nobody ever looked at it. This needs submitting to the dri-devel list for discussion.
Comment 2 Daniel Vetter 2012-11-13 12:23:02 UTC
No upstream intel gfx driver ever used TTM, so this is definitely for the closed source (userspace) stack. Closing as WONTFIX.
Comment 3 Alan 2012-11-13 17:20:58 UTC
Only question I have is whether this fix is relevant to any upstream non intel gfx ?
Comment 4 Jonathan Morton 2012-11-13 17:29:07 UTC
My questions are:

1) If the bugfix is valid, why not apply it?  It seems to be low-risk.  If it's not valid, please explain why.

2) If it is valid and yet not to be applied, why not completely remove the subsystem?  External drivers that need it can then provide their own version of it.

Incidentally, the driver I am working on does indeed have closed-source userspace, but the kernel side is GPLed.  I am therefore not sure why it is marked as tainted.
Comment 5 Daniel Vetter 2012-11-13 17:33:36 UTC
Well, you can always submit the ttm patch directly to dri-devel. But I'm no ttm expert, so can't judge whether it's the right thing. And ttm is ridiculously complex ...
Comment 6 Jonathan Morton 2012-11-13 17:36:08 UTC
It seems that Radeon driver(s) also use TTM and the hardware supports high memory access, so it is potentially relevant.
Comment 7 Jonathan Morton 2012-11-13 17:40:24 UTC
(In reply to comment #5)
> Well, you can always submit the ttm patch directly to dri-devel. But I'm no
> ttm
> expert, so can't judge whether it's the right thing. And ttm is ridiculously
> complex ...

The calls involved (to clear_page and clear_highpage) don't look TTM specific to me.  It seems to me that any time TTM allocates from highmem, this bug would be triggered.  Unfortunately, not all PCs with >800MB RAM run 64-bit kernels.

But if dri-devel is the right place to get this patch in, I'll send it there.
Comment 8 Florian Mickler 2012-11-17 20:14:37 UTC
A patch referencing this bug report has been merged in Linux v3.7-rc6:

commit ac207ed2471150e06af0afc76e4becc701fa2733
Author: Zhao Yakui <yakui.zhao@intel.com>
Date:   Tue Nov 13 18:31:55 2012 +0000

    ttm: Clear the ttm page allocated from high memory zone correctly