Subject : [BISECTED] drm: random hang since 620f378 "drm: prune modes when ..." Submitter : Arnd Bergmann <arnd@arndb.de> Date : 2009-12-07 17:30 References : http://marc.info/?l=linux-kernel&m=126020704125723&w=4 Handled-By : Jesse Barnes <jbarnes@virtuousgeek.org> This entry is being used for tracking a regression from 2.6.31. Please don't close it until the problem is fixed in the mainline.
Hi, Bergmann Will you please try to add the boot option of "drm.debug=0x06" on the latest linux kernel(2.6.33-rc2) and see whether the issue still exists? Please also attach the output of dmesg. Thanks.
Sorry, I'm still traveling and haven't been able to test more variations of the problem on my home machine. I hope to nail it down further once I get home in February. The 2.6.33-rc kernel and the patched i915 tree were still crashing when I last tried, but the first broken version (the i915 branch that got merged into 2.6.32-rc1) with the same patch applied seemed not to (hard to tell), so there may be more than one place in the code causing the same symptoms with 2.6.32.
Im OpenSuSE 11.2 64bit user. and everything was fine until 2.6.32.6 kernel from http://download.opensuse.org/repositories/Kernel:/HEAD/openSUSE_11.2 was updated to 2.6.33.rc5 (and then rc6). Xserver 1.7.4 and video drivers from here http://download.opensuse.org/repositories/X11:/XOrg/openSUSE_11.2/ Very easy to catch this bug if enable compositing and run something graphic extensive like gtkperf (after few runs) Without compositing system can work stable for a quite long time, even whole day, but sometimes crashes anyway. Last messages after hang: Feb 1 17:59:17 C0050 kernel: [ 483.616212] [drm:i915_add_request], 414449 Feb 1 17:59:17 C0050 kernel: [ 483.616216] [drm:i915_add_request], 414450 Feb 1 17:59:17 C0050 kernel: [ 484.116005] [drm:intel_gpu_idle_timer], idle timer fired, downclocking Feb 1 17:59:18 C0050 kernel: [ 484.349189] [drm:intel_crtc_cursor_set], Feb 1 17:59:18 C0050 kernel: [ 484.349193] [drm:intel_crtc_cursor_set], cursor off Feb 1 17:59:18 C0050 kernel: [ 484.367139] [drm:i915_add_request], 414451 Feb 1 17:59:18 C0050 kernel: [ 484.367147] [drm:i915_add_request], 414452 Feb 1 17:59:18 C0050 kernel: [ 484.367327] [drm:drm_crtc_helper_set_config], Feb 1 17:59:18 C0050 kernel: [ 484.367330] [drm:drm_crtc_helper_set_config], crtc: ffff880037eed000 3 fb: ffff88007dd32300 connectors: ffff88007ddac740 num_connectors: 1 (x, y) (0, 0) Feb 1 17:59:18 C0050 kernel: [ 484.367340] [drm:drm_crtc_helper_set_config], setting connector 5 crtc to ffff880037eed000 Feb 1 17:59:18 C0050 kernel: [ 484.367345] [drm:intel_pipe_set_base], Writing base 007DF000 00000000 0 0 Feb 1 17:59:18 C0050 kernel: [ 484.388113] [drm:drm_crtc_helper_set_config], Feb 1 17:59:18 C0050 kernel: [ 484.388116] [drm:drm_crtc_helper_set_config], crtc: ffff880037eed000 3 fb: ffff88007dd32300 connectors: ffff88007ddac740 num_connectors: 1 (x, y) (0, 0) Feb 1 17:59:18 C0050 kernel: [ 484.388123] [drm:drm_crtc_helper_set_config], setting connector 5 crtc to ffff880037eed000 Feb 1 17:59:18 C0050 kernel: [ 484.616010] [drm:intel_crtc_idle_timer], idle timer fired, downclocking Feb 1 17:59:18 C0050 kernel: [ 485.009006] [drm:intel_gpu_idle_timer], idle timer fired, downclocking Feb 1 17:59:19 C0050 kernel: [ 485.983008] [drm:intel_gpu_idle_timer], idle timer fired, downclocking Feb 1 17:59:20 C0050 kernel: [ 486.984005] [drm:intel_gpu_idle_timer], idle timer fired, downclocking Feb 1 17:59:21 C0050 kernel: [ 487.984008] [drm:intel_gpu_idle_timer], idle timer fired, downclocking Feb 1 17:59:23 C0050 kernel: [ 489.984008] [drm:intel_gpu_idle_timer], idle timer fired, downclocking Feb 1 17:59:24 C0050 kernel: [ 490.984006] [drm:intel_gpu_idle_timer], idle timer fired, downclocking Feb 1 17:59:25 C0050 kernel: [ 491.517007] [drm:intel_gpu_idle_timer], idle timer fired, downclocking Feb 1 17:59:26 C0050 kernel: [ 492.985007] [drm:intel_gpu_idle_timer], idle timer fired, downclocking Feb 1 17:59:27 C0050 kernel: [ 493.516008] [drm:intel_gpu_idle_timer], idle timer fired, downclocking or even: Feb 1 18:11:57 C0050 kernel: [ 194.282297] [drm:i915_add_request], 31771 Feb 1 18:11:57 C0050 kernel: [ 194.282301] [drm:i915_add_request], 31772 Feb 1 18:11:57 C0050 kernel: [ 194.282376] [drm:i915_add_request], 31773 Feb 1 18:11:57 C0 i915.powersave=0 doesn't help
(In reply to comment #3) > Im OpenSuSE 11.2 64bit user. and everything was fine until 2.6.32.6 kernel > from > http://download.opensuse.org/repositories/Kernel:/HEAD/openSUSE_11.2 was > updated to 2.6.33.rc5 (and then rc6). Xserver 1.7.4 and video drivers from > here > http://download.opensuse.org/repositories/X11:/XOrg/openSUSE_11.2/ > > Very easy to catch this bug if enable compositing and run something graphic > extensive like gtkperf (after few runs) Without compositing system can work > stable for a quite long time, even whole day, but sometimes crashes anyway. May be a different problem from Arnd's, but can you bisect and figure out which commit causes your hang?
Im sorry. I have no experience of building kernel from sources. I get kernels only from repos. All i can say: now ive installed 2.6.32.7 kernel - and it seems to work stable again (only "[drm:i915_gem_execbuffer] *ERROR* Object ffff88007e71b680 appears more than once in object list" messages sometimes appear in dmesg) And about 2.6.33. Later i've noticed that even when AIGLX and compositing off and system doesnt crush there are several stack traces in dmesg: Feb 9 15:28:52 C0050 kernel: [103911.029907] Xorg: page allocation failure. order:1, mode:0x10d0, alloc_flags:0x40 pflags:0x2100 Feb 9 15:28:52 C0050 kernel: [103911.029913] Pid: 23634, comm: Xorg Not tainted 2.6.33-rc6-11-desktop #1 Feb 9 15:28:52 C0050 kernel: [103911.029916] Call Trace: Feb 9 15:28:52 C0050 kernel: [103911.029933] [<ffffffff81006109>] dump_trace+0x79/0x340 Feb 9 15:28:52 C0050 kernel: [103911.029940] [<ffffffff814957f1>] dump_stack+0x69/0x6f Feb 9 15:28:52 C0050 kernel: [103911.029948] [<ffffffff810deb7d>] __alloc_pages_slowpath+0x3ed/0x570 Feb 9 15:28:52 C0050 kernel: [103911.029955] [<ffffffff810deea7>] __alloc_pages_nodemask+0x1a7/0x1b0 Feb 9 15:28:52 C0050 kernel: [103911.029962] [<ffffffff81116f05>] kmem_getpages+0x55/0x160 Feb 9 15:28:52 C0050 kernel: [103911.029968] [<ffffffff81117f8a>] fallback_alloc+0x14a/0x220 Feb 9 15:28:52 C0050 kernel: [103911.029973] [<ffffffff8111842c>] __kmalloc+0x14c/0x280 Feb 9 15:28:52 C0050 kernel: [103911.029980] [<ffffffff812ee98a>] agp_alloc_page_array+0x5a/0x70 Feb 9 15:28:52 C0050 kernel: [103911.029986] [<ffffffff812eea07>] agp_create_user_memory+0x67/0x150 Feb 9 15:28:52 C0050 kernel: [103911.029991] [<ffffffff812eeb00>] agp_generic_alloc_user+0x10/0x60 Feb 9 15:28:52 C0050 kernel: [103911.029997] [<ffffffff812ef062>] agp_allocate_memory+0xa2/0x150 Feb 9 15:28:52 C0050 kernel: [103911.030021] [<ffffffffa0280aff>] drm_agp_bind_pages+0x4f/0xf0 [drm] Feb 9 15:28:52 C0050 kernel: [103911.030054] [<ffffffffa030a10b>] i915_gem_object_bind_to_gtt+0x12b/0x320 [i915] Feb 9 15:28:52 C0050 kernel: [103911.030077] [<ffffffffa030a3a5>] i915_gem_object_pin+0xa5/0xc0 [i915] Feb 9 15:28:52 C0050 kernel: [103911.030100] [<ffffffffa030b426>] i915_gem_object_pin_and_relocate+0x56/0x360 [i915] Feb 9 15:28:52 C0050 kernel: [103911.030123] [<ffffffffa030bc8f>] i915_gem_do_execbuffer+0x55f/0xab0 [i915] Feb 9 15:28:52 C0050 kernel: [103911.030159] [<ffffffffa030c95d>] i915_gem_execbuffer+0x1bd/0x370 [i915] Feb 9 15:28:52 C0050 kernel: [103911.030185] [<ffffffffa02799a8>] drm_ioctl+0x2e8/0x440 [drm] Feb 9 15:28:52 C0050 kernel: [103911.030197] [<ffffffff81133815>] vfs_ioctl+0x35/0xd0 Feb 9 15:28:52 C0050 kernel: [103911.030205] [<ffffffff81133d28>] do_vfs_ioctl+0x78/0x420 Feb 9 15:28:52 C0050 kernel: [103911.030213] [<ffffffff81134168>] sys_ioctl+0x98/0x110 Feb 9 15:28:52 C0050 kernel: [103911.030222] [<ffffffff81002ffb>] system_call_fastpath+0x16/0x1b Feb 9 15:28:52 C0050 kernel: [103911.030237] [<00007f7825fee7e7>] 0x7f7825fee7e7 Feb 9 15:28:52 C0050 kernel: [103911.030241] Mem-Info: Feb 9 15:28:52 C0050 kernel: [103911.030245] Node 0 DMA per-cpu: Feb 9 15:28:52 C0050 kernel: [103911.030251] CPU 0: hi: 0, btch: 1 usd: 0 Feb 9 15:28:52 C0050 kernel: [103911.030256] CPU 1: hi: 0, btch: 1 usd: 0 Feb 9 15:28:52 C0050 kernel: [103911.030260] Node 0 DMA32 per-cpu: Feb 9 15:28:52 C0050 kernel: [103911.030266] CPU 0: hi: 186, btch: 31 usd: 0 Feb 9 15:28:52 C0050 kernel: [103911.030270] CPU 1: hi: 186, btch: 31 usd: 0 Feb 9 15:28:52 C0050 kernel: [103911.030278] active_anon:332828 inactive_anon:124449 isolated_anon:25 Feb 9 15:28:52 C0050 kernel: [103911.030280] active_file:9893 inactive_file:8868 isolated_file:7 Feb 9 15:28:52 C0050 kernel: [103911.030283] unevictable:1 dirty:5 writeback:2430 unstable:0 Feb 9 15:28:52 C0050 kernel: [103911.030286] free:3735 slab_reclaimable:6239 slab_unreclaimable:8780 Feb 9 15:28:52 C0050 kernel: [103911.030288] mapped:14806 shmem:43822 pagetables:7599 bounce:0 Feb 9 15:28:52 C0050 kernel: [103911.030293] Node 0 DMA free:8008kB min:40kB low:48kB high:60kB active_anon:2304kB inactive_anon:2384kB active_file:940kB inactive_file:1076kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15296kB mlocked:0kB dirty:0kB writeback:0kB mapped:1044kB shmem:1172kB slab_reclaimable:1004kB slab_unreclaimable:48kB kernel_stack:8kB pagetables:20kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Feb 9 15:28:52 C0050 kernel: [103911.030315] lowmem_reserve[]: 0 1996 1996 1996 Feb 9 15:28:52 C0050 kernel: [103911.030326] Node 0 DMA32 free:6932kB min:5696kB low:7120kB high:8544kB active_anon:1329008kB inactive_anon:495412kB active_file:38632kB inactive_file:34396kB unevictable:4kB isolated(anon):100kB isolated(file):28kB present:2043984kB mlocked:4kB dirty:20kB writeback:9720kB mapped:58180kB shmem:174116kB slab_reclaimable:23952kB slab_unreclaimable:35072kB kernel_stack:3208kB pagetables:30376kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Feb 9 15:28:52 C0050 kernel: [103911.030349] lowmem_reserve[]: 0 0 0 0 Feb 9 15:28:52 C0050 kernel: [103911.030359] Node 0 DMA: 38*4kB 14*8kB 4*16kB 2*32kB 3*64kB 2*128kB 2*256kB 1*512kB 2*1024kB 2*2048kB 0*4096kB = 8008kB Feb 9 15:28:52 C0050 kernel: [103911.030386] Node 0 DMA32: 1477*4kB 40*8kB 6*16kB 3*32kB 2*64kB 1*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 6932kB Feb 9 15:28:52 C0050 kernel: [103911.030413] 121342 total pagecache pages Feb 9 15:28:52 C0050 kernel: [103911.030417] 58765 pages in swap cache Feb 9 15:28:52 C0050 kernel: [103911.030421] Swap cache stats: add 3404291, delete 3345526, find 737207/1110937 Feb 9 15:28:52 C0050 kernel: [103911.030426] Free swap = 397628kB Feb 9 15:28:52 C0050 kernel: [103911.030429] Total swap = 1542200kB Feb 9 15:28:52 C0050 kernel: [103911.038666] 522160 pages RAM Feb 9 15:28:52 C0050 kernel: [103911.038669] 10212 pages reserved Feb 9 15:28:52 C0050 kernel: [103911.038671] 90616 pages shared Feb 9 15:28:53 C0050 kernel: [103911.038673] 481352 pages non-shared
Glad it's stable at least. The page allocation failures look like they're due to a low memory situation, or your memory is highly fragmented so that even an order 1 allocation (2 pages or 8k) is failing. Does the latest -rc still have this problem? Maybe you're running into a memory leak?
Sorry i cant use my work computer for stability tests anymore. And i havent seen any changes about i915 between rc6-rc7. So i better will wait for 2.6.33 release or at least some commits containing i915 fixes. And Maybe Arnd Bergmann will give your some helpful information. PS: with both kernels (2.6.32.7 and 2.6.33.rc6) Xorg may stop drawing new frames. All applications are still working and even cursor moves and i can switch to terminal and back. And many lines about failing to flush some buffer appear in Xorg.log.. I havent tested new 2.6.32.8 yet - it contains some fixes for i915.
Ah. I wasnt able to hold myself from testing 2.6.33-rc7 kernel (from that SuSE repo). And... it is miracle. I cant hang my system now for a few hours. Dozens gtkperf tests was passed with AIGLX compositing enabled in KDE4.4 environment with all effects enabled. I dont know what is really responsible for this good change. All i have updated instead of kernel is KDE from 4.3 to 4.4 and pixman from 0.17.2 to 0.17.4 and some other Xorg packages. dmesg and Xorg.log both clear from any errors.. But of course its to early to celebrate. I can give some for tests to this PC on the next week.
Thanks for the update Ildar. I'll tentatively close this one; hope it was fixed by a 2D driver or Mesa driver fix in your update. Please re-open if you find the same hang on your PC later.