A few seconds to a few minutes after the X server has started, the system locks up. Sometimes only one or two processes lock up before the system locks up entirely, sometimes the entire system locks up all at once - pressing caps lock does no longer trigger the keyboard caps lock LED. I haven't seen this behavior ever before, not even with any rc kernel. The same kernel with the same config runs fine in a virtual machine. Could this be the radeon driver that is causing trouble ? $ lspci|grep VGA 02:00.0 VGA compatible controller: ATI Technologies Inc RV610 [Radeon HD 2400 XT] kernel version: $ git show HEAD|head -n 4 commit 94c8a984ae2adbd9a9626fb42e0f2faf3e36e86f Merge: f9fa0bc 37adb89 Author: Linus Torvalds <torvalds@linux-foundation.org> Date: Fri Apr 8 11:47:35 2011 -0700 GRUB entry: title openSUSE 11.4 - 2.6.39-rc2+-debug root (hd0,0) kernel /boot/vmlinuz-2.6.39-rc2-debug+ root=/dev/md0 resume=/dev/md1 splash=silent showopts vga=0x31a nomodesetting initrd /boot/initrd-2.6.39-rc2-debug+
Is this a regression? Did previous kernels work ok? If so, can you bisect?
Does it work ok if you remove: vga=0x31a nomodesetting from your grub entry?
(In reply to comment #1) > Is this a regression? Did previous kernels work ok? If so, can you bisect? Yes, it's a regression - 2.6.38 and 2.6.38.2 run perfectly on the same system. I haven't had much luck with the bisect though - halfway bisecting I encountered a commit that made my system unbootable because reassembling the RAID1 /boot partition failed. The log I have so far is: $ git bisect start 'drivers/video' # good: [521cb40b0c44418a4fd36dc633f575813d59a43d] Linux 2.6.38 git bisect good 521cb40b0c44418a4fd36dc633f575813d59a43d # bad: [94c8a984ae2adbd9a9626fb42e0f2faf3e36e86f] Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 git bisect bad 94c8a984ae2adbd9a9626fb42e0f2faf3e36e86f # good: [da49252fb0392d8196833ef3da92e48fb371f8d7] Merge branch 'for-paul' of git://gitorious.org/linux-omap-dss2/linux git bisect good da49252fb0392d8196833ef3da92e48fb371f8d7 # good: [bf5f0019046d596d613caf74722ba4994e153899] video, sm501: add I/O functions for use on powerpc git bisect good bf5f0019046d596d613caf74722ba4994e153899 # good: [6b794743b2c5e21825d35b5d5dd57d6fcc388198] unicore32 framebuffer fix: get videomemory by __get_free_pages() and make it floatable git bisect good 6b794743b2c5e21825d35b5d5dd57d6fcc388198 unknown because of boot failure: 21cd72e7cb424f1686855602ec0fdc6e5830f249
'git bisect skip' to skip problematic commits.
Also, the drm is not in drivers/video, it's in drivers/gpu
Although I'm still busy bisecting, I'd like to report that I got the following hung task report with head b73a21fc66fee35b41db755abebfacba48b2fc76 (had already seen something similar before with 2.6.39-rc2): INFO: task kjournald:918 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kjournald D ffff880131b9ddb8 0 918 2 0x00000000 ffff880131b9dd20 0000000000000046 ffff880131b9dca0 ffffffff8108cd6d 0000000000000282 ffff880131b9dfd8 ffff880137729f40 ffff880131b9dfd8 ffff880131b9c000 ffff880131b9c000 ffff880131b9c000 ffff880131b9dfd8 Call Trace: [<ffffffff8108cd6d>] ? trace_hardirqs_on_caller+0x14d/0x190 [<ffffffff81048419>] ? sub_preempt_count+0xa9/0xe0 [<ffffffffa00c615e>] journal_commit_transaction+0x13e/0x1590 [jbd] [<ffffffff813e2535>] ? _raw_spin_unlock_irqrestore+0x65/0x80 [<ffffffff81048419>] ? sub_preempt_count+0xa9/0xe0 [<ffffffff81075280>] ? wake_up_bit+0x40/0x40 [<ffffffff8106258a>] ? del_timer_sync+0x8a/0xc0 [<ffffffff81062500>] ? try_to_del_timer_sync+0x110/0x110 [<ffffffffa00ca1a1>] kjournald+0xf1/0x250 [jbd] [<ffffffff81075280>] ? wake_up_bit+0x40/0x40 [<ffffffffa00ca0b0>] ? commit_timeout+0x10/0x10 [jbd] [<ffffffff81074c66>] kthread+0x96/0xa0 [<ffffffff813e4294>] kernel_thread_helper+0x4/0x10 [<ffffffff8103cebb>] ? finish_task_switch+0x7b/0xe0 [<ffffffff813e24ab>] ? _raw_spin_unlock_irq+0x3b/0x60 [<ffffffff813e2944>] ? retint_restore_args+0xe/0xe [<ffffffff81074bd0>] ? __init_kthread_worker+0x70/0x70 [<ffffffff813e4290>] ? gs_change+0xb/0xb no locks held by kjournald/918. INFO: task klauncher:5744 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. klauncher D 00000001000297b4 0 5744 5743 0x00000000 ffff88011dd73938 0000000000000046 ffff880100000000 ffffffff8108cbef ffffffff813e2535 ffff88011dd73fd8 ffff8801382e1f40 ffff88011dd73fd8 ffff88011dd72000 ffff88011dd72000 ffff88011dd72000 ffff88011dd73fd8 Call Trace: [<ffffffff8108cbef>] ? mark_held_locks+0x6f/0xa0 [<ffffffff813e2535>] ? _raw_spin_unlock_irqrestore+0x65/0x80 [<ffffffff8115b830>] ? __wait_on_buffer+0x30/0x30 [<ffffffff813df0e9>] io_schedule+0x59/0x80 [<ffffffff8115b83e>] sleep_on_buffer+0xe/0x20 [<ffffffff813df89a>] __wait_on_bit_lock+0x5a/0xc0 [<ffffffff8115b830>] ? __wait_on_buffer+0x30/0x30 [<ffffffff813df978>] out_of_line_wait_on_bit_lock+0x78/0x90 [<ffffffff810752d0>] ? autoremove_wake_function+0x50/0x50 [<ffffffff8115b886>] __lock_buffer+0x36/0x40 [<ffffffffa00c56ad>] do_get_write_access+0x64d/0x660 [jbd] [<ffffffff81048419>] ? sub_preempt_count+0xa9/0xe0 [<ffffffffa00c33e0>] ? start_this_handle+0x370/0x470 [jbd] [<ffffffffa00cb594>] ? journal_add_journal_head+0xf4/0x220 [jbd] [<ffffffffa00c58f1>] journal_get_write_access+0x31/0x50 [jbd] [<ffffffffa00efe6d>] __ext3_journal_get_write_access+0x2d/0x60 [ext3] [<ffffffffa00e2423>] ext3_reserve_inode_write+0x83/0xb0 [ext3] [<ffffffffa00e2494>] ext3_mark_inode_dirty+0x44/0x70 [ext3] [<ffffffffa00e4ffe>] ext3_dirty_inode+0x5e/0xa0 [ext3] [<ffffffff8115412f>] __mark_inode_dirty+0x3f/0x250 [<ffffffff8114627c>] file_update_time+0xec/0x170 [<ffffffff813dfecd>] ? mutex_lock_nested+0x27d/0x3a0 [<ffffffff810e4138>] __generic_file_aio_write+0x1f8/0x440 [<ffffffff810e43f5>] generic_file_aio_write+0x75/0xf0 [<ffffffff8112ca6a>] do_sync_write+0xda/0x120 [<ffffffff8110a2d7>] ? remove_vma+0x77/0x90 [<ffffffff8108cdbd>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff8110a2d7>] ? remove_vma+0x77/0x90 [<ffffffff8112d186>] vfs_write+0xc6/0x170 [<ffffffff8112d481>] sys_write+0x51/0x90 [<ffffffff813e31eb>] system_call_fastpath+0x16/0x1b 2 locks held by klauncher/5744: #0: (&sb->s_type->i_mutex_key#11){+.+.+.}, at: [<ffffffff810e43d9>] generic_file_aio_write+0x59/0xf0 #1: (jbd_handle){+.+...}, at: [<ffffffffa00c33e0>] start_this_handle+0x370/0x470 [jbd] INFO: task okular:4180 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. okular D 000000010002a251 0 4180 5743 0x00000000 ffff880041d13aa8 0000000000000046 ffff880000000000 ffffffff8108cd6d 0000000000000282 ffff880041d13fd8 ffff880037a59f40 ffff880041d13fd8 ffff880041d12000 ffff880041d12000 ffff880041d12000 ffff880041d13fd8 Call Trace: [<ffffffff8108cd6d>] ? trace_hardirqs_on_caller+0x14d/0x190 [<ffffffffa00c32b4>] start_this_handle+0x244/0x470 [jbd] [<ffffffff8109a633>] ? is_module_address+0x33/0x60 [<ffffffff81075280>] ? wake_up_bit+0x40/0x40 [<ffffffffa00c370b>] journal_start+0xdb/0x120 [jbd] [<ffffffffa00eb436>] ext3_journal_start_sb+0x36/0x70 [ext3] [<ffffffffa00e2663>] ext3_setattr+0x1a3/0x210 [ext3] [<ffffffff81148556>] notify_change+0x116/0x360 [<ffffffff8112b803>] do_truncate+0x63/0x90 [<ffffffff81048419>] ? sub_preempt_count+0xa9/0xe0 [<ffffffff8113abac>] do_last+0x42c/0x820 [<ffffffff8113c3b0>] path_openat+0xd0/0x410 [<ffffffff81102ad3>] ? might_fault+0x53/0xb0 [<ffffffff8113c76f>] do_filp_open+0x7f/0xa0 [<ffffffff81048419>] ? sub_preempt_count+0xa9/0xe0 [<ffffffff813e2585>] ? _raw_spin_unlock+0x35/0x60 [<ffffffff81149834>] ? alloc_fd+0xf4/0x150 [<ffffffff8112c651>] do_sys_open+0x101/0x1e0 [<ffffffff8112c750>] sys_open+0x20/0x30 [<ffffffff813e31eb>] system_call_fastpath+0x16/0x1b 2 locks held by okular/4180: #0: (&sb->s_type->i_mutex_key#11){+.+.+.}, at: [<ffffffff8112b7f7>] do_truncate+0x57/0x90 #1: (&sb->s_type->i_alloc_sem_key#4){+.+...}, at: [<ffffffff811486e0>] notify_change+0x2a0/0x360
Still occurs with 85f2e689a5c8fb6ed8fdbee00109e7f6e5fefcb6 (2.6.29-rc3+). Note: I'm still trying to find the offending commit via bisecting.
The result of the bisect process - not sure it's useful: # git bisect skip There are only 'skip'ped commits left to test. The first bad commit could be any of: 62e0ff1ef2d8ea0814487f73a7de431396a1e914 1fcf0069f4715f6f811466db68a547a348b4d5a9 94e948e6e43cd34e0e2ca496d5e90e4ff0d884f9 53f358a81e10e798f44af896ffacaedd7ac0269b e9c5db0b8dce1bcdc99ad26e718230810d6b5cff b73a21fc66fee35b41db755abebfacba48b2fc76 We cannot bisect more! git bisect start # skip: [1fcf0069f4715f6f811466db68a547a348b4d5a9] Merge branch 'common/fbdev' of master.kernel.org:/pub/scm/linux/kernel/git/lethal/sh-2.6 git bisect skip 1fcf0069f4715f6f811466db68a547a348b4d5a9 # skip: [e9c5db0b8dce1bcdc99ad26e718230810d6b5cff] efifb: support AMD Radeon HD 6490 git bisect skip e9c5db0b8dce1bcdc99ad26e718230810d6b5cff # skip: [21cd72e7cb424f1686855602ec0fdc6e5830f249] savagefb: Set up I2C based on chip family instead of card id git bisect skip 21cd72e7cb424f1686855602ec0fdc6e5830f249 # skip: [47c87d930f3db4fc3a30505075e07f5597e2e953] fb: Reduce priority of resource conflict message git bisect skip 47c87d930f3db4fc3a30505075e07f5597e2e953 # good: [899631c7916b231ba6509c90dbc33221e9194029] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input git bisect good 899631c7916b231ba6509c90dbc33221e9194029 # bad: [b73a21fc66fee35b41db755abebfacba48b2fc76] video: s3c-fb: fix checkpatch errors and warning git bisect bad b73a21fc66fee35b41db755abebfacba48b2fc76 # skip: [6c5103890057b1bb781b26b7aae38d33e4c517d8] Merge branch 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block git bisect skip 6c5103890057b1bb781b26b7aae38d33e4c517d8 # good: [247f99386100d1d1c369ba98120d2edebf5426fc] fbdev: sh_mobile_lcdcfb: fix module lock acquisition git bisect good 247f99386100d1d1c369ba98120d2edebf5426fc # skip: [53f358a81e10e798f44af896ffacaedd7ac0269b] Merge branch 'viafb-next' of git://github.com/schandinat/linux-2.6 git bisect skip 53f358a81e10e798f44af896ffacaedd7ac0269b # good: [3f086fe93f734ba76f2e130777687f81e0cbb318] viafb: initialize margins correct git bisect good 3f086fe93f734ba76f2e130777687f81e0cbb318 # skip: [62e0ff1ef2d8ea0814487f73a7de431396a1e914] fbcon: Remove unused 'display *p' variable from fb_flashcursor() git bisect skip 62e0ff1ef2d8ea0814487f73a7de431396a1e914 # skip: [94e948e6e43cd34e0e2ca496d5e90e4ff0d884f9] s3fb: fix Virge/GX2 git bisect skip 94e948e6e43cd34e0e2ca496d5e90e4ff0d884f9
On Sunday, April 17, 2011, Linus Torvalds wrote: > Is this machine running a RAID5 setup or something like that? > > There is a known interaction with the new block layer plugging code > and MD. The "hung task" report in that bugzilla looks very much like > that issue. And you do have "root=/dev/md0", so clearly there's some > md thing going on. > > And bisecting might not work all that well for it, because I suspect > it ends up being very much a matter of IO patterns how it triggers. > > Neil supposedly has a patch for it, but I haven't seen it yet. Neil, Jens? > > Linus
Was fixed in 2.6.39-rc4.