Bug 14939 - drm: random hang with i915
Summary: drm: random hang with i915
Status: CLOSED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - Intel) (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Jesse Barnes
URL:
Keywords:
Depends on:
Blocks: 14230
  Show dependency tree
 
Reported: 2009-12-29 13:15 UTC by Rafael J. Wysocki
Modified: 2010-02-12 23:50 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.32
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments

Description Rafael J. Wysocki 2009-12-29 13:15:31 UTC
Subject    : [BISECTED] drm: random hang since 620f378 "drm: prune modes when ..."
Submitter  : Arnd Bergmann <arnd@arndb.de>
Date       : 2009-12-07 17:30
References : http://marc.info/?l=linux-kernel&m=126020704125723&w=4
Handled-By : Jesse Barnes <jbarnes@virtuousgeek.org>

This entry is being used for tracking a regression from 2.6.31.  Please don't
close it until the problem is fixed in the mainline.
Comment 1 ykzhao 2010-01-02 13:40:16 UTC
Hi, Bergmann
   Will you please try to add the boot option of "drm.debug=0x06" on the latest linux kernel(2.6.33-rc2) and see whether the issue still exists? Please also attach the output of dmesg.

   Thanks.
Comment 2 Arnd Bergmann 2010-01-27 01:56:12 UTC
Sorry, I'm still traveling and haven't been able to test more variations of the problem on my home machine. I hope to nail it down further once I get home in February. The 2.6.33-rc kernel and the patched i915 tree were still crashing when I last tried, but the first broken version (the i915 branch that got merged into 2.6.32-rc1) with the same patch applied seemed not to (hard to tell), so there may be more than one place in the code causing the same symptoms with 2.6.32.
Comment 3 Ildar Nurislamov 2010-02-01 15:41:34 UTC
Im OpenSuSE 11.2 64bit user. and everything was fine until 2.6.32.6 kernel from http://download.opensuse.org/repositories/Kernel:/HEAD/openSUSE_11.2 was updated to 2.6.33.rc5 (and then rc6). Xserver 1.7.4 and video drivers from here http://download.opensuse.org/repositories/X11:/XOrg/openSUSE_11.2/

Very easy to catch this bug if enable compositing and run something graphic extensive like gtkperf (after few runs) Without compositing system can work stable for a quite long time, even whole day, but sometimes crashes anyway.

Last messages after hang:
Feb  1 17:59:17 C0050 kernel: [  483.616212] [drm:i915_add_request], 414449
Feb  1 17:59:17 C0050 kernel: [  483.616216] [drm:i915_add_request], 414450
Feb  1 17:59:17 C0050 kernel: [  484.116005] [drm:intel_gpu_idle_timer], idle timer fired, downclocking
Feb  1 17:59:18 C0050 kernel: [  484.349189] [drm:intel_crtc_cursor_set], 
Feb  1 17:59:18 C0050 kernel: [  484.349193] [drm:intel_crtc_cursor_set], cursor off
Feb  1 17:59:18 C0050 kernel: [  484.367139] [drm:i915_add_request], 414451
Feb  1 17:59:18 C0050 kernel: [  484.367147] [drm:i915_add_request], 414452
Feb  1 17:59:18 C0050 kernel: [  484.367327] [drm:drm_crtc_helper_set_config], 
Feb  1 17:59:18 C0050 kernel: [  484.367330] [drm:drm_crtc_helper_set_config], crtc: ffff880037eed000 3 fb: ffff88007dd32300 connectors: ffff88007ddac740 num_connectors: 1 (x, y) (0, 0)
Feb  1 17:59:18 C0050 kernel: [  484.367340] [drm:drm_crtc_helper_set_config], setting connector 5 crtc to ffff880037eed000
Feb  1 17:59:18 C0050 kernel: [  484.367345] [drm:intel_pipe_set_base], Writing base 007DF000 00000000 0 0
Feb  1 17:59:18 C0050 kernel: [  484.388113] [drm:drm_crtc_helper_set_config], 
Feb  1 17:59:18 C0050 kernel: [  484.388116] [drm:drm_crtc_helper_set_config], crtc: ffff880037eed000 3 fb: ffff88007dd32300 connectors: ffff88007ddac740 num_connectors: 1 (x, y) (0, 0)
Feb  1 17:59:18 C0050 kernel: [  484.388123] [drm:drm_crtc_helper_set_config], setting connector 5 crtc to ffff880037eed000
Feb  1 17:59:18 C0050 kernel: [  484.616010] [drm:intel_crtc_idle_timer], idle timer fired, downclocking
Feb  1 17:59:18 C0050 kernel: [  485.009006] [drm:intel_gpu_idle_timer], idle timer fired, downclocking
Feb  1 17:59:19 C0050 kernel: [  485.983008] [drm:intel_gpu_idle_timer], idle timer fired, downclocking
Feb  1 17:59:20 C0050 kernel: [  486.984005] [drm:intel_gpu_idle_timer], idle timer fired, downclocking
Feb  1 17:59:21 C0050 kernel: [  487.984008] [drm:intel_gpu_idle_timer], idle timer fired, downclocking
Feb  1 17:59:23 C0050 kernel: [  489.984008] [drm:intel_gpu_idle_timer], idle timer fired, downclocking
Feb  1 17:59:24 C0050 kernel: [  490.984006] [drm:intel_gpu_idle_timer], idle timer fired, downclocking
Feb  1 17:59:25 C0050 kernel: [  491.517007] [drm:intel_gpu_idle_timer], idle timer fired, downclocking
Feb  1 17:59:26 C0050 kernel: [  492.985007] [drm:intel_gpu_idle_timer], idle timer fired, downclocking
Feb  1 17:59:27 C0050 kernel: [  493.516008] [drm:intel_gpu_idle_timer], idle timer fired, downclocking


or even:

Feb  1 18:11:57 C0050 kernel: [  194.282297] [drm:i915_add_request], 31771
Feb  1 18:11:57 C0050 kernel: [  194.282301] [drm:i915_add_request], 31772
Feb  1 18:11:57 C0050 kernel: [  194.282376] [drm:i915_add_request], 31773
Feb  1 18:11:57 C0


i915.powersave=0 doesn't help
Comment 4 Jesse Barnes 2010-02-05 18:53:41 UTC
(In reply to comment #3)
> Im OpenSuSE 11.2 64bit user. and everything was fine until 2.6.32.6 kernel
> from
> http://download.opensuse.org/repositories/Kernel:/HEAD/openSUSE_11.2 was
> updated to 2.6.33.rc5 (and then rc6). Xserver 1.7.4 and video drivers from
> here
> http://download.opensuse.org/repositories/X11:/XOrg/openSUSE_11.2/
> 
> Very easy to catch this bug if enable compositing and run something graphic
> extensive like gtkperf (after few runs) Without compositing system can work
> stable for a quite long time, even whole day, but sometimes crashes anyway.

May be a different problem from Arnd's, but can you bisect and figure out which commit causes your hang?
Comment 5 Ildar Nurislamov 2010-02-10 08:52:49 UTC
Im sorry. I have no experience of building kernel from sources. I get kernels only from repos. All i can say: now ive installed 2.6.32.7 kernel - and it seems to work stable again (only "[drm:i915_gem_execbuffer] *ERROR* Object ffff88007e71b680 appears more than once in object list" messages sometimes appear in dmesg)
And about 2.6.33. Later i've noticed that even when AIGLX and compositing off and system doesnt crush there are several stack traces in dmesg:

Feb  9 15:28:52 C0050 kernel: [103911.029907] Xorg: page allocation failure. order:1, mode:0x10d0, alloc_flags:0x40 pflags:0x2100                                                
Feb  9 15:28:52 C0050 kernel: [103911.029913] Pid: 23634, comm: Xorg Not tainted 2.6.33-rc6-11-desktop #1                                                                        
Feb  9 15:28:52 C0050 kernel: [103911.029916] Call Trace:                                                                                                                        
Feb  9 15:28:52 C0050 kernel: [103911.029933]  [<ffffffff81006109>] dump_trace+0x79/0x340                                                                                        
Feb  9 15:28:52 C0050 kernel: [103911.029940]  [<ffffffff814957f1>] dump_stack+0x69/0x6f                                                                                         
Feb  9 15:28:52 C0050 kernel: [103911.029948]  [<ffffffff810deb7d>] __alloc_pages_slowpath+0x3ed/0x570                                                                           
Feb  9 15:28:52 C0050 kernel: [103911.029955]  [<ffffffff810deea7>] __alloc_pages_nodemask+0x1a7/0x1b0                                                                           
Feb  9 15:28:52 C0050 kernel: [103911.029962]  [<ffffffff81116f05>] kmem_getpages+0x55/0x160                                                                                     
Feb  9 15:28:52 C0050 kernel: [103911.029968]  [<ffffffff81117f8a>] fallback_alloc+0x14a/0x220                                                                                   
Feb  9 15:28:52 C0050 kernel: [103911.029973]  [<ffffffff8111842c>] __kmalloc+0x14c/0x280                                                                                        
Feb  9 15:28:52 C0050 kernel: [103911.029980]  [<ffffffff812ee98a>] agp_alloc_page_array+0x5a/0x70                                                                               
Feb  9 15:28:52 C0050 kernel: [103911.029986]  [<ffffffff812eea07>] agp_create_user_memory+0x67/0x150                                                                            
Feb  9 15:28:52 C0050 kernel: [103911.029991]  [<ffffffff812eeb00>] agp_generic_alloc_user+0x10/0x60                                                                             
Feb  9 15:28:52 C0050 kernel: [103911.029997]  [<ffffffff812ef062>] agp_allocate_memory+0xa2/0x150                                                                               
Feb  9 15:28:52 C0050 kernel: [103911.030021]  [<ffffffffa0280aff>] drm_agp_bind_pages+0x4f/0xf0 [drm]                                                                           
Feb  9 15:28:52 C0050 kernel: [103911.030054]  [<ffffffffa030a10b>] i915_gem_object_bind_to_gtt+0x12b/0x320 [i915]                                                               
Feb  9 15:28:52 C0050 kernel: [103911.030077]  [<ffffffffa030a3a5>] i915_gem_object_pin+0xa5/0xc0 [i915]                                                                         
Feb  9 15:28:52 C0050 kernel: [103911.030100]  [<ffffffffa030b426>] i915_gem_object_pin_and_relocate+0x56/0x360 [i915]                                                           
Feb  9 15:28:52 C0050 kernel: [103911.030123]  [<ffffffffa030bc8f>] i915_gem_do_execbuffer+0x55f/0xab0 [i915]                                                                    
Feb  9 15:28:52 C0050 kernel: [103911.030159]  [<ffffffffa030c95d>] i915_gem_execbuffer+0x1bd/0x370 [i915]                                                                       
Feb  9 15:28:52 C0050 kernel: [103911.030185]  [<ffffffffa02799a8>] drm_ioctl+0x2e8/0x440 [drm]                                                                                  
Feb  9 15:28:52 C0050 kernel: [103911.030197]  [<ffffffff81133815>] vfs_ioctl+0x35/0xd0                                                                                          
Feb  9 15:28:52 C0050 kernel: [103911.030205]  [<ffffffff81133d28>] do_vfs_ioctl+0x78/0x420                                                                                      
Feb  9 15:28:52 C0050 kernel: [103911.030213]  [<ffffffff81134168>] sys_ioctl+0x98/0x110                                                                                         
Feb  9 15:28:52 C0050 kernel: [103911.030222]  [<ffffffff81002ffb>] system_call_fastpath+0x16/0x1b                                                                               
Feb  9 15:28:52 C0050 kernel: [103911.030237]  [<00007f7825fee7e7>] 0x7f7825fee7e7                                                                                               
Feb  9 15:28:52 C0050 kernel: [103911.030241] Mem-Info:                                                                                                                          
Feb  9 15:28:52 C0050 kernel: [103911.030245] Node 0 DMA per-cpu:                                                                                                                
Feb  9 15:28:52 C0050 kernel: [103911.030251] CPU    0: hi:    0, btch:   1 usd:   0                                                                                             
Feb  9 15:28:52 C0050 kernel: [103911.030256] CPU    1: hi:    0, btch:   1 usd:   0                                                                                             
Feb  9 15:28:52 C0050 kernel: [103911.030260] Node 0 DMA32 per-cpu:                                                                                                              
Feb  9 15:28:52 C0050 kernel: [103911.030266] CPU    0: hi:  186, btch:  31 usd:   0                                                                                             
Feb  9 15:28:52 C0050 kernel: [103911.030270] CPU    1: hi:  186, btch:  31 usd:   0                                                                                             
Feb  9 15:28:52 C0050 kernel: [103911.030278] active_anon:332828 inactive_anon:124449 isolated_anon:25                                                                           
Feb  9 15:28:52 C0050 kernel: [103911.030280]  active_file:9893 inactive_file:8868 isolated_file:7                                                                               
Feb  9 15:28:52 C0050 kernel: [103911.030283]  unevictable:1 dirty:5 writeback:2430 unstable:0                                                                                   
Feb  9 15:28:52 C0050 kernel: [103911.030286]  free:3735 slab_reclaimable:6239 slab_unreclaimable:8780                                                                           
Feb  9 15:28:52 C0050 kernel: [103911.030288]  mapped:14806 shmem:43822 pagetables:7599 bounce:0                                                                                 
Feb  9 15:28:52 C0050 kernel: [103911.030293] Node 0 DMA free:8008kB min:40kB low:48kB high:60kB active_anon:2304kB inactive_anon:2384kB active_file:940kB inactive_file:1076kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15296kB mlocked:0kB dirty:0kB writeback:0kB mapped:1044kB shmem:1172kB slab_reclaimable:1004kB slab_unreclaimable:48kB kernel_stack:8kB pagetables:20kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no                                                              
Feb  9 15:28:52 C0050 kernel: [103911.030315] lowmem_reserve[]: 0 1996 1996 1996                                                                                                 
Feb  9 15:28:52 C0050 kernel: [103911.030326] Node 0 DMA32 free:6932kB min:5696kB low:7120kB high:8544kB active_anon:1329008kB inactive_anon:495412kB active_file:38632kB inactive_file:34396kB unevictable:4kB isolated(anon):100kB isolated(file):28kB present:2043984kB mlocked:4kB dirty:20kB writeback:9720kB mapped:58180kB shmem:174116kB slab_reclaimable:23952kB slab_unreclaimable:35072kB kernel_stack:3208kB pagetables:30376kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no                        
Feb  9 15:28:52 C0050 kernel: [103911.030349] lowmem_reserve[]: 0 0 0 0                                                                                                          
Feb  9 15:28:52 C0050 kernel: [103911.030359] Node 0 DMA: 38*4kB 14*8kB 4*16kB 2*32kB 3*64kB 2*128kB 2*256kB 1*512kB 2*1024kB 2*2048kB 0*4096kB = 8008kB                         
Feb  9 15:28:52 C0050 kernel: [103911.030386] Node 0 DMA32: 1477*4kB 40*8kB 6*16kB 3*32kB 2*64kB 1*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 6932kB                     
Feb  9 15:28:52 C0050 kernel: [103911.030413] 121342 total pagecache pages                                                                                                       
Feb  9 15:28:52 C0050 kernel: [103911.030417] 58765 pages in swap cache                                                                                                          
Feb  9 15:28:52 C0050 kernel: [103911.030421] Swap cache stats: add 3404291, delete 3345526, find 737207/1110937                                                                 
Feb  9 15:28:52 C0050 kernel: [103911.030426] Free swap  = 397628kB                                                                                                              
Feb  9 15:28:52 C0050 kernel: [103911.030429] Total swap = 1542200kB                                                                                                             
Feb  9 15:28:52 C0050 kernel: [103911.038666] 522160 pages RAM                                                                                                                   
Feb  9 15:28:52 C0050 kernel: [103911.038669] 10212 pages reserved                                                                                                               
Feb  9 15:28:52 C0050 kernel: [103911.038671] 90616 pages shared                                                                                                                 
Feb  9 15:28:53 C0050 kernel: [103911.038673] 481352 pages non-shared
Comment 6 Jesse Barnes 2010-02-11 18:06:31 UTC
Glad it's stable at least.  The page allocation failures look like they're due to a low memory situation, or your memory is highly fragmented so that even an order 1 allocation (2 pages or 8k) is failing.  Does the latest -rc still have this problem?  Maybe you're running into a memory leak?
Comment 7 Ildar Nurislamov 2010-02-11 20:04:33 UTC
Sorry i cant use my work computer for stability tests anymore. And i havent seen any changes about i915 between rc6-rc7. So i better will wait for 2.6.33 release or at least some commits containing i915 fixes. And Maybe Arnd Bergmann will give your some helpful information.

PS: with both kernels (2.6.32.7 and 2.6.33.rc6) Xorg may stop drawing new frames. All applications are still working and even cursor moves and i can switch to terminal and back. And many lines about failing to flush some buffer appear in Xorg.log.. I havent tested new 2.6.32.8 yet - it contains some fixes for i915.
Comment 8 Ildar Nurislamov 2010-02-12 14:41:25 UTC
Ah. I wasnt able to hold myself from testing 2.6.33-rc7 kernel (from that SuSE repo). And... it is miracle. I cant hang my system now for a few hours. Dozens gtkperf tests was passed with AIGLX compositing enabled in KDE4.4 environment with all effects enabled. I dont know what is really responsible for this good change. All i have updated instead of kernel is KDE from 4.3 to 4.4 and pixman from 0.17.2 to 0.17.4 and some other Xorg packages. 

dmesg and Xorg.log both clear from any errors.. But of course its to early to celebrate. I can give some for tests to this PC on the next week.
Comment 9 Jesse Barnes 2010-02-12 17:40:28 UTC
Thanks for the update Ildar.  I'll tentatively close this one; hope it was fixed by a 2D driver or Mesa driver fix in your update.  Please re-open if you find the same hang on your PC later.

Note You need to log in before you can comment on or make changes to this bug.