Bug 14129

Summary: 2.6.31 regression - pci_get_slot oops, udev boot hang - toshiba X200
Product: ACPI Reporter: chepioq (chepioq)
Component: Config-OtherAssignee: acpi_config-other
Status: CLOSED CODE_FIX    
Severity: normal CC: acpi-bugzilla, devzero, lenb, rjw
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.31 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 7216, 13615    
Attachments: lspci -vvv.txt
dmesg.boot.2.6.30
dmesg.boot.2.6.30.F12
2.6.31.boot
boot.log.2.6.30.txt
pci_root: fix NULL pointer deref after resume from suspend

Description chepioq 2009-09-06 07:01:47 UTC
My laptop is an toshiba X200, with intel core2 duo cpu 7500 centrino and nvidia card 8700M GT.
I want test the fedora rawhide (future F12), and with kernel 2.6.31 fedora don't boot (that's work with kernel 2.6.30)
When I install a 2.6.31 kernel (the latest is 2.6.31-0.199.rc8.git2.fc12.x86_64) and I when I want booting on this kernel I have a black screen, with freeze of my laptop (keyboard don't work,fan of my laptop is very speed, and I have obliged to do an hard reset.)
For test I compile an 2.6.31 kernel and i install it for same result.
Also I think it is not a fedora problem, but a kernel bug for my laptop.
I have an problem with kernel 2.6.29 and my laptop (see this bug: http://bugzilla.kernel.org/show_bug.cgi?id=12735 ) and you have resolve it with a patch.
Comment 1 Roland Kletzing 2009-09-06 10:15:56 UTC
please attach lspci -vvv and dmesg of working boot and describe more details about the failing boot. what messages do you see? if you see nothing, can you remove params like "quiet" or "splash" and add "vga=normal" to the bootparams?
Comment 2 chepioq 2009-09-06 12:50:13 UTC
I remove quiet and add vga=normal, and now I see all messages in boot, but they are too fast and I can't note this.
I see the progress bar, but when this progress bar is finish the boot seems stopped.
I wait for 15 minutes and I reboot with CTRL+ALT+BACK-SPACE.
I look in the log, but there is no dmesg for this boot.
I attach lspci -vvv and dmesg for a boot with 2.6.30 kernel
Comment 3 chepioq 2009-09-06 12:51:05 UTC
Created attachment 23020 [details]
lspci -vvv.txt
Comment 4 chepioq 2009-09-06 12:51:47 UTC
Created attachment 23021 [details]
dmesg.boot.2.6.30
Comment 5 Roland Kletzing 2009-09-06 13:16:25 UTC
>nvidia: module license 'NVIDIA' taints kernel.
>Disabling lock debugging due to kernel taint

can you please retry without the nvidia module? 
(either temporarly rename it or unsinstall the appropriate package)

most kernel developers won`t look at your problem if closed source modules are in place, as they often are the source of problems which can`t be solved here.

it`s just to make sure that the nvidia module is NOT the source of the problem.

what progress bar do you see? 

please boot into text-mode only and check if that works without problems. 

if that is the case, try starting X11 manually.
Comment 6 chepioq 2009-09-06 13:27:44 UTC
Sorry, for F12 I have no kernel module installed...
But I make a mistake and attach the dmesg for my F11.
I create a new attachment with the good dmesg (for F12)
Comment 7 chepioq 2009-09-06 13:29:47 UTC
Created attachment 23024 [details]
dmesg.boot.2.6.30.F12
Comment 8 chepioq 2009-09-06 15:59:26 UTC
I also boot in text-mode for same result.
boot hang after progress-bar, and I am not able to do an startx.
The progress-bar is bar of fedora (line blue-white) which progress during boot, and after, normally, I have logging screen, but not with 2.6.31 kernel
Comment 9 Roland Kletzing 2009-09-06 16:28:11 UTC
i see:
Command line: ro root=UUID=215e72f1-9f2d-4940-96d8-f0f28c3854a3 rhgb vga=0x365 quiet LANG=fr_FR.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=fr-latin9 rd_plytheme=charge

i wrote:
>and describe more details
>about the failing boot. what messages do you see? if you see nothing, can you
>remove params like "quiet" or "splash" and add "vga=normal" to the bootparams?

apparently, you did not remove quite and did not add "vga=normal" to the bootparams.
please do that.
furthermore remove "rhgb", as rhgb param enables progressbar, afaik. you can try hitting ESC during but, that may also disable the progess bar. instead of vga=normal you can also use vga=ask and choose a console with enough lines to print whole kernel trace/oops (if there is one)
Comment 10 chepioq 2009-09-06 18:03:21 UTC
I have remove quiet and add vga=normal...
now I remove "rhgb" and use vga=ask.
I see an error message, and I copy it by hand... (see attachment 2 [details].6.31.boot.txt)
Comment 11 chepioq 2009-09-06 18:04:18 UTC
Created attachment 23026 [details]
2.6.31.boot
Comment 12 Roland Kletzing 2009-09-10 19:18:35 UTC
can you change /etc/udev/udev.conf -> udev_log="debug"  and post log results of good boot with working kernel and bad boot with 2.6.31 ?
Comment 13 chepioq 2009-09-11 05:23:33 UTC
I change /etc/udev/udev.conf-> udev_log="debug" and with that my 2.6.30 kernel won't boot, with "esc" I see a multiple line beginning by udev, I wait 10 minutes and I reboot my laptop.
But I have a boot.log for this boot (see attachment boot.log.2.6.30.txt).

With the 2.6.31 kernel, the same thing appear but I have no boot.log for that boot
Comment 14 chepioq 2009-09-11 05:24:28 UTC
Created attachment 23061 [details]
boot.log.2.6.30.txt
Comment 15 chepioq 2009-09-13 05:27:01 UTC
I take 2 photos of 2.6.31 boot:
Photo1:photo of boot:
http://pix.toile-libre.org/?img=1252817142.jpg

Photo2:photo after CTRL+ALT+DEL:
http://pix.toile-libre.org/?img=1252818460.jpg

If that help you...
Comment 16 chepioq 2009-09-15 13:50:36 UTC
I can boot on a kernel 2.6.31 with my laptop Toshiba, with add this option on boot 
acpi=off

But it is not an available solution, because with that I have no verification of temp for cpu and gpu...
Comment 17 Roland Kletzing 2009-09-15 19:34:56 UTC
this could be related: http://lkml.org/lkml/2009/7/20/426
Comment 18 chepioq 2009-09-16 08:56:16 UTC
For test I compile and install a 2.6.31 kernel with the patch of http://lkml.org/lkml/2009/7/20/426 , I remove acpi=off option in boot, but that don't work...
Same result as previously...
Comment 19 chepioq 2009-09-20 18:11:47 UTC
For test I do another think.
I use Fedora 11, and to see if the patch of fedora is not the origin of problem, I take a 2.6.31 kernel (here: http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.31.tar.bz2) and I compile and install it to my Fedora 11.
This kernel don't boot, except if I add acpi=off on boot.(same result as previously...).
I am not a programmer, and I don't made a patch.
But if I understand the log, the problem is with LNXVIDEO and LNXSYSTM...
Comment 20 chepioq 2009-09-22 06:33:29 UTC
Hi...
I want just add a comment:
For me, I think that it is a regression, because acpi work on my laptop (Toshiba X200) with an 2.6.30 kernel, and not working with 2.6.31 kernel.
If you want, I can do a regression test, if you explain me how I can do it.
Comment 21 chepioq 2009-09-23 16:31:02 UTC
Please, can you move target "Other" to "ACPI" ?
Comment 22 Roland Kletzing 2009-09-23 19:30:07 UTC
sorry, i don`t have the proper permission for this. donĀ“t know who`s looking at this, too.

>If you want, I can do a regression test, if you explain me how I can do it.

yes, that would be great

you can systematically and quite efficiently search for the "offending" patch (i.e. git commit) which introduced the problem.

the magic word to google for is "git bisect". 

if you have difficulties finding a good tutorial on how to git bisect, please let me know.
Comment 23 chepioq 2009-09-25 20:02:02 UTC
Hi
After searching on web, I am not able to make a bisect, my english is too bad and I don't understood how make this bisect...
But I found http://bugzilla.kernel.org/show_bug.cgi?id=14211 and the problem is same that mine.
Unfortunately, I try the patch ant it's don't work with my laptop (i try the two way, the patch and the patch -R.
If that can help you...
Comment 24 chepioq 2009-10-04 18:24:27 UTC
Hi...
After many search and test, I can do a git-bisect.
The result is:

80ffdedf6020a77adcd06c01cfe6c488312b28f8 is the first bad commit
commit 80ffdedf6020a77adcd06c01cfe6c488312b28f8
Author: Alexander Chiang <achiang@hp.com>
Date:   Wed Jun 10 19:55:55 2009 +0000

    ACPI: kill acpi_get_pci_id

    acpi_get_pci_dev() is better, and all callers have been converted, so
    eliminate acpi_get_pci_id().

    Signed-off-by: Alex Chiang <achiang@hp.com>
    Acked-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
    Signed-off-by: Len Brown <len.brown@intel.com>

:040000 040000 d4df802ef1782e3ec795be4fb015f1b797613c4e 0499fac9c0a9b479379f42d120ed72d75b9c2174 M      drivers
:040000 040000 a86418d0e1e49735be64671e6802010cb960d6da 66a3d0b724af6f89d064f9d026e6f68a02a2517d M      include

If that can help you...
Comment 25 Alex Chiang 2009-10-05 18:23:54 UTC
Created attachment 23269 [details]
pci_root: fix NULL pointer deref after resume from suspend

Can you please try this patch that Rafael wrote?

Thanks.
Comment 26 chepioq 2009-10-05 19:59:16 UTC
I try your patch with the latest kernel of fedora 11 (2.6.31.1-58.fc12.x86_64) and with this patch I boot without acpi=off option...
Thanks a lot, and I hope that this patch will be included in future kernels.
Thanks for your re-activity and your knowledge.
chepioq
Comment 27 Rafael J. Wysocki 2009-10-05 23:31:44 UTC
Patch : http://bugzilla.kernel.org/attachment.cgi?id=23269
Handled-By : Alex Chiang <achiang@hp.com>
Handled-By : Rafael J. Wysocki <rjw@sisk.pl>
Comment 28 Rafael J. Wysocki 2009-10-05 23:36:20 UTC
*** Bug 14317 has been marked as a duplicate of this bug. ***
Comment 29 Rafael J. Wysocki 2009-10-05 23:37:36 UTC
Ignore-Patch : http://bugzilla.kernel.org/attachment.cgi?id=23269
Patch : http://patchwork.kernel.org/patch/51834/
Comment 30 Rafael J. Wysocki 2009-10-26 19:22:38 UTC
Fixed by commit 497fb54f578efd2b479727bc88d5ef942c0a1e2d .