Created attachment 22393 [details] Photo of the screen after the crash Kernel 2.6.31 does not boot on my laptop (a DELL XPS M1330). I have tried rc1, rc2, rc3 and today's daily build but none of those will boot on my laptop. I get the following call trace with today's daily build: rcu_do_batch+0x27/0x90 __rcu_process_callbacks+0xc8/0x100 tick_handle_oneshot_broadcast+0xdd/0x100 rcu_process_callbacks+0x20/0x40 timer_interrupt+0x21/0x70 handle_IRQ_event+0x56/0x120 do_softirq+0x3c/0x40 irq_exit+0x5c/0x70 do_IRQ+0x4f/0xc0 common_interrupt+0x29/0x30 sys_getresuid+0x3b/0x70 acpi_idle_enter_bm+0x19a/0x1c9 cpuidle_idle_call+0x6f/0xc0 cpu_idle+0x42/0x80 start_secondary+0xae/0cd0
Created attachment 22394 [details] lspci -nnvv output
Hi, Christophe Will you please confirm whether it can be booted normally when using the old version kernel? For example: 2.6.29, 2.6.30. If so, will you please use the git-bisect to identify the bad commit which causes the regression? Will you please attach the output of acpidump, lspci -vxxx? Thanks.
Last kernel I booted was 2.6.28 and it was fine. I will get my hands on a 2.6.30 and test.
Created attachment 22403 [details] acpidump
Created attachment 22404 [details] lspci -vxxx
I have just booted on a 2.6.30 kernel and it works fine. Problems were introduced in v2.6.31. I'm not using git but premade packages from http://kernel.ubuntu.com/~kernel-ppa/mainline It is difficult to see when this problem started because I was affected by this bug since I switched to rc1: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/392709
I pulled kernel tree up until commit 4075ea8c54a7506844a69f674990241e7766357b from git. This commit occurred after rc1 and supposedly fixes booting on XPS M1330. Well, It does not boot on my XPS M1330.
The only recent commit I see that mentions this box is 412af97838828bc6d035a1902c8974f944663da6 "ACPI: video: prevent NULL deref in acpi_get_pci_dev()" but that went upstream immediately after 2.6.31-rc1, so you already have it. unclear if this regression is related to ACPI. Does 2.6.31-rc boot before the ACPI changes went in at: 0c26d7cc31cd81a82be3b9d7687217d49fe9c47e (note that it will fail after that, due to the NULL inj acpi_get_pci_dev() mentioned above -- so you can apply that patch manually to bisect forward...) any difference with "maxcpus=1"? any difference with "idle=poll"? how about with "acpi=off"?
Neither of these three booting parameters changed anything. So I guess this means ACPI is not the origin of the problem?
I compiled kernel just before commit 0c26d7cc31cd81a82be3b9d7687217d49fe9c47e (ACPI changes), and I applied NULL check patch in acpi_get_pci_dev() manually. Sadly, it does not boot but it crashes earlier in the boot process and the call trace seems different. This second problem was probably fixed later (because I did not experience it with later kernels). Could someone tell me which patch I should apply for this? Call trace: shmem_acl_init shmem_mknod vfs_mknod sys_mknodat handle_mm_fault do_page_fault sys_mknod syscall_call
The second problem I'm experiencing is this one it seems: http://lkml.indiana.edu/hypermail/linux/kernel/0906.3/00506.html Apparently, it was fixed on June 24th (just before rc1) by commit c6223048259006759237d826219f0fa4f312fb47. I will apply this patch too and retest.
I compiled kernel tree up to 0c26d7cc31cd81a82be3b9d7687217d49fe9c47e commit (excluded) with the following cherry picked patches: 412af97838828bc6d035a1902c8974f944663da6 : ACPI NULL reference check c6223048259006759237d826219f0fa4f312fb47 : JFS ACL race condition fix d5bb68adda7cc179e8efadeaa3a283cb470f13a6 : Another JFS ACL race condition fix The result is : - I don't get a crash but I get a lot of Input/Output errors and X is not launched. Boot process is just stuck at some point. iirc, I got the same result with rc3 by provided the parameter "idle=poll" (as advised in a previous post). I don't know if it means that the crash did not occur or simply if I cannot see it. In any case, it does not boot. Since I managed to get a prompt this time (despite the I/O Errors), I will post kern.log and dmesg.
Created attachment 22421 [details] dmesg output after I/O errors
Created attachment 22422 [details] kern.log after I/O errors
Created attachment 22423 [details] Input / Output Errors (Photo)
Ok, still using kernel tree up to 0c26d7cc31cd81a82be3b9d7687217d49fe9c47e commit (excluded) with the following cherry picked patches: 412af97838828bc6d035a1902c8974f944663da6 : ACPI NULL reference check c6223048259006759237d826219f0fa4f312fb47 : JFS ACL race condition fix d5bb68adda7cc179e8efadeaa3a283cb470f13a6 : Another JFS ACL race condition fix but with the same config file as I used for rc3, I can see the call trace this time. Therefore, the crash was not caused by ACPI changes. Although the call trace is not exactly the same, it looks similar: I will post it anyway. The problem was introduced after v2.6.30 release and 0c26d7cc31cd81a82be3b9d7687217d49fe9c47e commit (ACPI changes).
Created attachment 22426 [details] Photo of the crash with pre-rc1 (9937ac0cc087b03d6d73f46a5d6b38c43626e60e)
When I said I used kernel tree up to 0c26d7cc31cd81a82be3b9d7687217d49fe9c47e commit (excluded): I actually used tree up to 9937ac0cc087b03d6d73f46a5d6b38c43626e60e (included). I hope this is OK, I'm really not used to git and I did not know exactly how to do this (thus I chose a commit which happened slightly earlier in time, according to rc1 changelog).
I'm using git-bisect but it takes a lot of time. I'm providing my current results, hoping it will help: $ git bisect log git bisect start # good: [07a2039b8eb0af4ff464efd3dfd95de5c02648c6] Linux 2.6.30 git bisect good 07a2039b8eb0af4ff464efd3dfd95de5c02648c6 # bad: [9937ac0cc087b03d6d73f46a5d6b38c43626e60e] MAINTAINERS: Change mailing list info for CRIS git bisect bad 9937ac0cc087b03d6d73f46a5d6b38c43626e60e # good: [e7c5a4f292e0d1f4ba9a3a94b2c8e8b71e35b25a] powerpc/5121: make clock debug output more readable git bisect good e7c5a4f292e0d1f4ba9a3a94b2c8e8b71e35b25a # good: [0dd5198672dd2bbeb933862e1fc82162e0b636be] Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6 git bisect good 0dd5198672dd2bbeb933862e1fc82162e0b636be
I'm currently using git-bisect to pinpoint the bad commit but it takes a lot of time... I'm providing my current results, hoping it will help: $ git bisect log git bisect start # good: [07a2039b8eb0af4ff464efd3dfd95de5c02648c6] Linux 2.6.30 git bisect good 07a2039b8eb0af4ff464efd3dfd95de5c02648c6 # bad: [9937ac0cc087b03d6d73f46a5d6b38c43626e60e] MAINTAINERS: Change mailing list info for CRIS git bisect bad 9937ac0cc087b03d6d73f46a5d6b38c43626e60e # good: [e7c5a4f292e0d1f4ba9a3a94b2c8e8b71e35b25a] powerpc/5121: make clock debug output more readable git bisect good e7c5a4f292e0d1f4ba9a3a94b2c8e8b71e35b25a # good: [0dd5198672dd2bbeb933862e1fc82162e0b636be] Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6 git bisect good 0dd5198672dd2bbeb933862e1fc82162e0b636be
I'm making some progress: $ git bisect log git bisect start # good: [07a2039b8eb0af4ff464efd3dfd95de5c02648c6] Linux 2.6.30 git bisect good 07a2039b8eb0af4ff464efd3dfd95de5c02648c6 # bad: [9937ac0cc087b03d6d73f46a5d6b38c43626e60e] MAINTAINERS: Change mailing list info for CRIS git bisect bad 9937ac0cc087b03d6d73f46a5d6b38c43626e60e # good: [e7c5a4f292e0d1f4ba9a3a94b2c8e8b71e35b25a] powerpc/5121: make clock debug output more readable git bisect good e7c5a4f292e0d1f4ba9a3a94b2c8e8b71e35b25a # good: [0dd5198672dd2bbeb933862e1fc82162e0b636be] Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6 git bisect good 0dd5198672dd2bbeb933862e1fc82162e0b636be # good: [9b901ee0cb007eb4e2ee056e5b1c5c2837d53bdb] [WATCHDOG] wdt_pci.c: remove #ifdef CONFIG_WDT_501_PCI git bisect good 9b901ee0cb007eb4e2ee056e5b1c5c2837d53bdb # good: [7e0338c0de18c50f09aea1fbef45110cf7d64a3c] Merge branch 'for-2.6.31' of git://fieldses.org/git/linux-nfsd git bisect good 7e0338c0de18c50f09aea1fbef45110cf7d64a3c
Progressing: $ git bisect log git bisect start # good: [07a2039b8eb0af4ff464efd3dfd95de5c02648c6] Linux 2.6.30 git bisect good 07a2039b8eb0af4ff464efd3dfd95de5c02648c6 # bad: [9937ac0cc087b03d6d73f46a5d6b38c43626e60e] MAINTAINERS: Change mailing list info for CRIS git bisect bad 9937ac0cc087b03d6d73f46a5d6b38c43626e60e # good: [e7c5a4f292e0d1f4ba9a3a94b2c8e8b71e35b25a] powerpc/5121: make clock debug output more readable git bisect good e7c5a4f292e0d1f4ba9a3a94b2c8e8b71e35b25a # good: [0dd5198672dd2bbeb933862e1fc82162e0b636be] Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6 git bisect good 0dd5198672dd2bbeb933862e1fc82162e0b636be # good: [9b901ee0cb007eb4e2ee056e5b1c5c2837d53bdb] [WATCHDOG] wdt_pci.c: remove #ifdef CONFIG_WDT_501_PCI git bisect good 9b901ee0cb007eb4e2ee056e5b1c5c2837d53bdb # good: [7e0338c0de18c50f09aea1fbef45110cf7d64a3c] Merge branch 'for-2.6.31' of git://fieldses.org/git/linux-nfsd git bisect good 7e0338c0de18c50f09aea1fbef45110cf7d64a3c # good: [eebf8d86acf0db974dfaad8e8285f4e12ca488e2] V4L/DVB (12131): BUGFIX: An incorrect Carrier Recovery Loop optimization table was being git bisect good eebf8d86acf0db974dfaad8e8285f4e12ca488e2 # good: [a10b32db34898d0db58a58ef76a70c374931bbff] kgdb: kgdboc console poll hooks for serial_txx9 uart git bisect good a10b32db34898d0db58a58ef76a70c374931bbff
When I said I used kernel tree up to 0c26d7cc31cd81a82be3b9d7687217d49fe9c47e commit (excluded): I actually used tree up to 9937ac0cc087b03d6d73f46a5d6b38c43626e60e (included). Apparently, this was a mistake: 9937ac0cc087b03d6d73f46a5d6b38c43626e60e occurred *after* 9937ac0cc087b03d6d73f46a5d6b38c43626e60e according to git log. I will test commit 936940a9c7e3d99b25859bf1ff140d8c2480183a (Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6) right now. This one occurred just before ACPI changes. I have a feeling the ACPI commit (0c26d7cc31cd81a82be3b9d7687217d49fe9c47e) is the faulty one after all. I will confirm this in a few hours.
Apparently, [936940a9c7e3d99b25859bf1ff140d8c2480183a] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 is crashing already. So the ACPI commit does not seem to be the problem. I'll try to continue and bisect but bug seems to be between: [6122af3743a48dddae19810626dd7c9c8e6c1df8] asus_acpi: Deprecate in favor of asus-laptop and [936940a9c7e3d99b25859bf1ff140d8c2480183a] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
Thanks to git-bisect, I identified the following commit as the problem: commit 936940a9c7e3d99b25859bf1ff140d8c2480183a Merge: 09ce42d 1cbd20d Author: Linus Torvalds <torvalds@linux-foundation.org> Date: Wed Jun 24 10:03:12 2009 -0700 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (23 commits) switch xfs to generic acl caching helpers helpers for acl caching + switch to those switch shmem to inode->i_acl switch reiserfs to inode->i_acl switch reiserfs to usual conventions for caching ACLs reiserfs: minimal fix for ACL caching switch nilfs2 to inode->i_acl switch btrfs to inode->i_acl switch jffs2 to inode->i_acl switch jfs to inode->i_acl switch ext4 to inode->i_acl switch ext3 to inode->i_acl switch ext2 to inode->i_acl add caching of ACLs in struct inode fs: Add new pre-allocation ioctls to vfs for compatibility with legacy xfs ioctls cleanup __writeback_single_inode ... and the same for vfsmount id/mount group id Make allocation of anon devices cheaper update Documentation/filesystems/Locking devpts: remove module-related code ... Note that I'm using jfs so it could be related.
Since ACPI does not seem to be the problem, changing component to FileSystem.
Created attachment 22444 [details] mount output
I got a bit more precise now, still using git-bisect. The problem occurred after : [6582a0e6f6bc7bf64817b9e1a424782855292ab0] switch ext3 to inode->i_acl and of course before: [936940a9c7e3d99b25859bf1ff140d8c2480183a] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
still using git-bisect. The problem occurred after : [290c263bf83cd78e53b1aa3b42165f588163f2be] switch jffs2 to inode->i_acl and of course before: [936940a9c7e3d99b25859bf1ff140d8c2480183a] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
still using git-bisect. The problem occurred after : [281eede0328c84a8f20e0e85b807d5b51c3de4f2] switch reiserfs to inode->i_acl and of course before: [936940a9c7e3d99b25859bf1ff140d8c2480183a] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
I have just confirmed that the patch proposed by Stefan Bader on this bug report works: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/396780/comments/33
Created attachment 22475 [details] Do not release acl when returning
Christophe tried the patch above and it solved the crashes he was experiencing.