Latest working kernel version: 2.6.22 Earliest failing kernel version: 2.6.24 Distribution: Debian Hardware Environment: Samsung Q45 laptop Software Environment: Problem Description: When video.ko loads, the entire system freezes, requiring a power cycle to recover. Steps to reproduce: 'modprobe video' I previously wrote about this to LKML but didn't get a response. I'm filing it here to keep track of it. My original message was <http://lkml.org/lkml/2008/4/10/85>. lspci describes my system thusly: 00:00.0 Host bridge [0600]: Intel Corporation Mobile PM965/GM965/GL960 Memory Controller Hub [8086:2a00] (rev 03) 00:02.0 VGA compatible controller [0300]: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller [8086:2a02] (rev 03) 00:02.1 Display controller [0380]: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller [8086:2a03] (rev 03) 00:1a.0 USB Controller [0c03]: Intel Corporation 82801H (ICH8 Family) USB UHCI Contoller #4 [8086:2834] (rev 03) 00:1a.1 USB Controller [0c03]: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #5 [8086:2835] (rev 03) 00:1a.7 USB Controller [0c03]: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #2 [8086:283a] (rev 03) 00:1b.0 Audio device [0403]: Intel Corporation 82801H (ICH8 Family) HD Audio Controller [8086:284b] (rev 03) 00:1c.0 PCI bridge [0604]: Intel Corporation 82801H (ICH8 Family) PCI Express Port 1 [8086:283f] (rev 03) 00:1c.1 PCI bridge [0604]: Intel Corporation 82801H (ICH8 Family) PCI Express Port 2 [8086:2841] (rev 03) 00:1d.0 USB Controller [0c03]: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #1 [8086:2830] (rev 03) 00:1d.1 USB Controller [0c03]: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #2 [8086:2831] (rev 03) 00:1d.2 USB Controller [0c03]: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #3 [8086:2832] (rev 03) 00:1d.7 USB Controller [0c03]: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #1 [8086:2836] (rev 03) 00:1e.0 PCI bridge [0604]: Intel Corporation 82801 Mobile PCI Bridge [8086:2448] (rev f3) 00:1f.0 ISA bridge [0601]: Intel Corporation 82801HEM (ICH8M) LPC Interface Controller [8086:2815] (rev 03) 00:1f.2 IDE interface [0101]: Intel Corporation 82801HBM/HEM (ICH8M/ICH8M-E) SATA IDE Controller [8086:2828] (rev 03) 00:1f.3 SMBus [0c05]: Intel Corporation 82801H (ICH8 Family) SMBus Controller [8086:283e] (rev 03) 02:00.0 Network controller [0280]: Intel Corporation PRO/Wireless 3945ABG Network Connection [8086:4222] (rev 02) 03:00.0 Ethernet controller [0200]: Marvell Technology Group Ltd. 88E8039 PCI-E Fast Ethernet Controller [11ab:4353] (rev 15) 04:09.0 CardBus bridge [0607]: Ricoh Co Ltd RL5c476 II [1180:0476] (rev b4) 04:09.1 FireWire (IEEE 1394) [0c00]: Ricoh Co Ltd R5C552 IEEE 1394 Controller [1180:0552] (rev 09) 04:09.2 SD Host controller [0805]: Ricoh Co Ltd R5C822 SD/SDIO/MMC/MS/MSPro Host Adapter [1180:0822] (rev 18) 04:09.3 System peripheral [0880]: Ricoh Co Ltd R5C843 MMC Host Controller [1180:0843] 04:09.4 System peripheral [0880]: Ricoh Co Ltd R5C592 Memory Stick Bus Host Adapter [1180:0592] (rev 09) 04:09.5 System peripheral [0880]: Ricoh Co Ltd xD-Picture Card Controller [1180:0852] (rev 04) The freeze also occurs with Debian's 2.6.24 and 2.6.25-rc8 kernels.
First, please clear CONFIG_THERMAL and see if the problem still exists. If yes, please attach the acpidump output and the system log when it's frozen.
What's the right menu option to disable CONFIG_THERMAL? I don't see it any of the Kconfig files...
Ok, I found it... but disabling it requires me to disable CONFIG_ACPI, which disables building of video.ko, the loading of which is how I reproduce the problem. Is it sufficient for me to boot into such a kernel and report whether it freezes or finished booting?
Created attachment 15753 [details] acpidump output
So, booting into the kernel with CONFIG_THERMAL not set worked fine (though X was really slow to start up and switch VTs for some reason). Unfortunately, loading video.ko causes an instant freeze with no further kernel messages printed to the screen. Maybe I guess the system locks up before anything is printed?
I've tracked the problem down with git-bisect: 1ba90e3a87c46500623afdc3898573e4a5ebb21b is first bad commit commit 1ba90e3a87c46500623afdc3898573e4a5ebb21b Author: Thomas Renninger <trenn@suse.de> Date: Mon Jul 23 14:44:41 2007 +0200 ACPI: autoload modules - Create __mod_acpi_device_table symbol for all ACPI drivers modpost is going to use these to create e.g. acpi:ACPI0001 in modules.alias. Signed-off-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Len Brown <len.brown@intel.com> :040000 040000 efa53eaedfeaaca49a398a1bedc29ba0e1390d12 a143c7e1f64b711ed12c6d836832953585f0516b M drivers
So loading video.ko on 2.6.22 does in fact crash the system; it's just that it has only been loaded automatically since that commit, just after 2.6.23-rc1.
An Ubuntu user reported a similar freeze when loading video.ko. His laptop is a Samsung Q70 which I believe uses the same hardware internally to the Q45. From <https://bugs.launchpad.net/ubuntu/+source/linux-meta/+bug/216221>: I'll end up with this while trying to modprobe video.ko to adjust the screen backlight. [ 126.979966] ACPI Error (utglobal-0126): Unknown exception code: 0xFFFFFFFE [20070126] [ 126.979982] Pid: 5996, comm: modprobe Not tainted 2.6.24-15-generic #1 [ 126.979995] [<c024c694>] acpi_format_exception+0x35/0x3f [ 126.980016] [<c024b496>] acpi_ut_exception+0xc/0x55 [ 126.980030] [<f8a209e4>] acpi_video_bus_add+0xb0b/0xb1a [video] [ 126.980049] [<c01d3a5d>] sysfs_addrm_start+0x6d/0xb0 [ 126.980063] [<c01d47a3>] sysfs_create_link+0x93/0x110 [ 126.980083] [<c024fb2f>] acpi_device_probe+0x33/0x7c [ 126.980097] [<c027ebc8>] driver_probe_device+0x88/0x190 [ 126.980105] [<c0212370>] kobject_uevent_env+0xf0/0x3d0 [ 126.980125] [<c027ee3e>] __driver_attach+0x9e/0xa0 [ 126.980136] [<c027dffb>] bus_for_each_dev+0x3b/0x60 [ 126.980152] [<c027ea46>] driver_attach+0x16/0x20 [ 126.980159] [<c027eda0>] __driver_attach+0x0/0xa0 [ 126.980166] [<c027e37a>] bus_add_driver+0x8a/0x1e0 [ 126.980184] [<f8bcc02f>] acpi_video_init+0x2f/0x4d [video] [ 126.980193] [<c01516b6>] sys_init_module+0x126/0x19c0 [ 126.980248] [<c024fe36>] acpi_bus_register_driver+0x0/0x38 [ 126.980274] [<c01053c2>] sysenter_past_esp+0x6b/0xa9 [ 126.980300] ======================= Not sure why he gets kernel messages while I get a total freeze.
No, your problem is not a duplicate of bug #9761. It would be great if you can use a serial console to get the system log when it freezes, but I guess there is no serial console on your machine. Or it would be great if you can get a screen shot when the system freezes. And it would be great if you can add some printks to narrow down the problem, i.e. to find out which function the system is executing when freeze. (Or I can send you a debug patch which do the same thing)
Unfortunately there is no serial console. I will add some debug printk calls to video.ko however. Do you think it's worth trying out netconsole?
Oh, there are no messages at all when loading video.ko on the console so a screenshot without additional debugging info won't be very useful :)
(In reply to comment #10) > Unfortunately there is no serial console. I will add some debug printk calls > to > video.ko however. Do you think it's worth trying out netconsole? > I'm not familiar with netconsole. But if it can get more system log, YES. :)
Ok, I played around a bit with driver/acpi/video.c and put in printk's. This is the result I got: [ 848.382135] ACPI: Entering acpi_video_init [ 848.382216] ACPI: Entering acpi_video_bus_add [ 848.382265] ACPI: Entering acpi_video_bus_find_cap [ 848.382322] ACPI: Entering acpi_video_bus_check [ 848.382373] ACPI: Entering acpi_video_bus_add_fs [ 848.382426] ACPI: Entering acpi_video_bus_get_devices [ 848.382478] ACPI: Entering acpi_video_device_enumerate [ 848.382528] ACPI: Calling acpi_evaluate_object So the code leading up to the crash is: static int acpi_video_device_enumerate(struct acpi_video_bus *video) { printk(KERN_ERR PREFIX "Entering %s\n", __FUNCTION__); int status; int count; int i; struct acpi_video_enumerated_device *active_device_list; struct acpi_buffer buffer = { ACPI_ALLOCATE_BUFFER, NULL }; union acpi_object *dod = NULL; union acpi_object *obj; printk(KERN_ERR PREFIX "Calling acpi_evaluate_object\n"); status = acpi_evaluate_object(video->device->handle, "_DOD", NULL, &buffer); And it seems to crash in the acpi_evaluate_object function. I'll see if I can get more info. Let me know if I should test something specific.
OK, I noticed CONFIG_ACPI_DEBUG and CONFIG_ACPI_DEBUG_FUNC_TRACE. Why has nobody suggested their use before? Anywat, I compiled 2.6.25 with these options. This gives me the following detailed log. But unfortunately I did not get everything that's written on the screen via netconsole and I don't have a digital camera. I used 0xFFFFFFFF for debug_level and debug_layer. That was probably too detailed. Maybe other parameters would give a more useful output. I just don't know which. Any ideas?
Created attachment 15910 [details] acpidump output in Samsung Q45
OK, with debug_level set to ACPI_LV_ALL I got the full output via netconsole. Attaching it. Let me know if I can do anything else to help.
Created attachment 15911 [details] ACPI debug output
Created attachment 15969 [details] customized dsdt Please try this customized DSDT, and attach the dmesg output not matter it hangs or not when loading video driver.
Hello, I have a Samsung Q45 as and have the same problem. I'm on bios level 14st in case that matters. Tried the above dsdt file and didn't seem to make any difference. Should a disassembled version of this DSDT file be identical to the original version or did you make any changes to it? The kernel seemed to compile OK with the include DSDT option, so I think I am using the supplied DSDT, but I don't know how to tell, can't see any mention of replacing the DSDT in dmesg. I don't have much useful info to add, I haven't worked out how to do netconsole etc to get a dmesg after the video module is loaded. I have played around with the video.c file a bit and found that only the NVID object locks the computer, not the GFX0 object. I skipped NVID and the module loaded OK. Is any more info I can get for you that you would find useful?
I'll test the DSDT tomorrow, I don't have the time right now, sorry... Alex, I use the following script for netconsole (after booting with init=/bin/bash), you should be able to adapt it easily: On the receiving end I run: nc -l -p 6666 -u | tee kernel.log On the sending end: --------- #!/bin/bash mount -o remount,rw / mount -t sysfs /sys /sys mount -t proc none /proc sysctl -w kernel.printk="7 4 1 7" # Load network driver modprobe sky2 sleep 1 ifconfig eth0 192.168.2.101 # Let the interface come up... sleep 2 # enable netconsole # 4444@... is source port@ip/interface, 6666@... is destination # port@ip/mac-address modprobe netconsole netconsole=4444@192.168.2.101/eth0,6666@192.168.2.103/00:13:8F:D3:45:B2 # enable ACPI debugging (but not too much :) echo 0x0077FF5F > /sys/module/acpi/parameters/debug_level ----- BTW, I didn't know that 14ST exists, I did an upgrade at most a month ago, and I'm at 12ST... Will check for new versions tomorrow...
Created attachment 16000 [details] dmesg output with ACPI debugging enabled I tried it with the new DSDT but it still locks up the machine. I'll attach the dmesg output. As far as I could see netconsole captured everything that was written on the screen. Let me know if I can test anything else.
Created attachment 16017 [details] customized dsdt Please try this one. Please boot with acpi_debug.level=0x0f and attach the dmesg output. :) Note: this dsdt won't fix the video hang issue, it only provides some info so that we can see which piece of code caused the problem. I suspect the following AML code is the criminal. Store (0x1E, SMIF) Store (0x02, PVFN) Store (Zero, TRP0) So please attach the dmesg output with the customized DSDT and the lspci output.
The attached DSDT somehow has the wrong format. It should be something that a C compiler can understand. Unfortunately I have no idea how to convert it... Can you provide it in the right format to be included in the kernel compilation?
Created attachment 16035 [details] customized dsdt Oops. my mistake. please try this one. :)
Created attachment 16036 [details] dmesg with customized DSDT That's the output with the latest DSDT. It just prints one line when loading video.ko: [ACPI Debug] String: [0x07] "In _DOD"
Created attachment 16037 [details] lspci output
Hi, Mike, thanks for the quick response. Store ("In _DOD", Debug) Store (0x1E, SMIF) Store (0x02, PVFN) Store (Zero, TRP0) Store ("CADL =", Debug) Store (CADL, Debug) there are only three lines between "In _DOD" and the next debug message. so the laptop hangs when executing one of these commands. They trap into BIOS to do something that we don't know yet. In order to make sure, I'll comment these three lines and build a new customized DSDT. Wish you can give it a try.
Created attachment 16038 [details] new dsdt
Created attachment 16042 [details] dmesg with latest DSDT We're getting closer. This time I get an error message BUT the module actually loads (i.e. no hang)! And it kindof works, too (in that "echo N > /proc/acpi/video/NVI0/LCD/brightness" changes the brightness of my display. I don't know what else to test). The messages I get when loading the module are: [ACPI Debug] String: [0x07] "In _DOD" [ACPI Debug] String: [0x06] "CADL =" [ACPI Debug] Integer: 0x0000000000000002 [ACPI Debug] String: [0x06] "PSIZ =" [ACPI Debug] Integer: 0x0000000000000001 ACPI Error (utglobal-0126): Unknown exception code: 0xFFFFFFFE [20070126] Pid: 1146, comm: modprobe Tainted: G A 2.6.25-x86-latest.git-06791-gd269f [<c023c07d>] acpi_format_exception+0x2b/0x35 [<c023a2c2>] acpi_ut_exception+0xc/0x43 [<f8866d06>] acpi_video_bus_add+0xb59/0xb6b [video] [<c018e7f6>] sysfs_find_dirent+0x13/0x23 [<c018f355>] sysfs_create_link+0xa3/0xc2 [<c023ffde>] acpi_device_probe+0x37/0xcd [<c0268688>] driver_probe_device+0x9d/0x114 [<c0268736>] __driver_attach+0x37/0x55 [<c026819d>] bus_for_each_dev+0x35/0x57 [<c026853b>] driver_attach+0x11/0x13 [<c02686ff>] __driver_attach+0x0/0x55 [<c0267c6b>] bus_add_driver+0x91/0x192 [<c02688e1>] driver_register+0x45/0x9a [<f883e02f>] acpi_video_init+0x2f/0x4d [video] [<c013bd62>] sys_init_module+0x1514/0x163e [<c0240311>] acpi_bus_register_driver+0x0/0x38 [<c01064d2>] sys_mmap2+0x62/0x77 [<c01037c6>] sysenter_past_esp+0x6a/0x90 ======================= ACPI Exception (video-1707): UNKNOWN_STATUS_CODE, Cant attach device [20070126] input: Video Bus as /class/input/input5 ACPI: Video Device [NVID] (multi-head: yes rom: no post: no) input: Video Bus as /class/input/input6 ACPI: Video Device [GFX0] (multi-head: yes rom: no post: no)
Okay, there are two problems here. One is the video hangs issue, which we have find the specific AML code which causes the hang. Another is the ACPI Exception. For the first problem, I'm afraid it's BIOS related that we can not fix in Linux kernel. Don't know if windows XP/Vista can work well on this laptop. For the second issue, please open another bug and attach the full dmesg output. I'll send a debug patch for that issue. :)
Are you sure that the problems are unrelated? To me it seems like something in the new DSDT changed the problem form "hang" to "regular ACPI exception". So with the latest DSDT I cannot reproduce a hang, just the exception. And with the original DSDT I cannot reproduce the exception, just the hang. And yes, the Laptop works just find with Windows Vista and Windows XP. But I'll open a new bug anyway :)
Yes. :) The first is a BIOS problem that I don't know how to debug further. The second problem exists after I commented the AML code which invoke SMI during ACPI video initialization and recompile the DSDT, which means this problem still exists if the video hang problem is fixed. And it's a ACPI video driver problem.
Ah, OK. I misread your previous comment. I thought you had just inserted debugging statements into the DSDT. Then it makes much sense of course :) Any suggestions as to how I might further debug the problem or someone else who might have more suggestions. It seems like there are users who were able to boot with some specific unreleased Ubuntu kernel. I'll try to find out if that's true and if I csan find code changes that make a difference. But thanks a lot for your help so far!
I've opened http://bugzilla.kernel.org/show_bug.cgi?id=10683 for the other issue.
(In reply to comment #33) > It seems like there are users who were able to boot with some specific > unreleased Ubuntu kernel. I'll try to find out if that's true and if I csan > find code changes that make a difference. FYI, it was the kernel that shipped on the 8.04 beta CD. If you can't find a copy anywhere else, let me know and I can put a copy of the binaries online for you, and also try to track down a copy of the source code.
Thanks Sam, I should be able to find it. So this is not the issue you mentioned earlier, that it still crashes but the module is just not loaded automatically?
Sorry, I'm not quite sure what you are asking? IIRC, on the 8.04 beta CD, the kernel module was loaded correctly, without a crash. I don't recall /sys/class/backlight being populated however. I'll check again if you want.
I'm referring to your comments above (6 and 7). I just wanted to know if you know if the Ubuntu kernel also works because the video module is just not autoloaded. If you have the Beta CD handy, could you quickly check whether video.ko is loaded and have a look in /proc/acpi/video/? That could save me a bit of time in case it boots just because it does not load the video module...
Sam, if the video driver can be loaded successully, please attach the dmesg output please attach the result of "tree /proc/acpi/video/" and "cat /proc/acpi/video/*/*".
Created attachment 16149 [details] dmesg output from Ubuntu 0.84 beta livecd
Sorry about the delay. On the Ubuntu 8.04 beta live cd, the module does load automatically at boot time, and the system does not freeze.
Created attachment 16150 [details] output of 'tree /pro/acpi/video'
Created attachment 16151 [details] contents of files in /proc/acpi/video
OK, I did a bisection based on the Ubuntu kernel. This leads to the following commit that introduces the hang again after it was working correctly with the 8.04 beta kernel as confirmed by Sam: ---------- commit 69ed7d7e1807eb9071d85861062cf23db71c2910 Author: Stefan Bader <stefan.bader@canonical.com> Date: Wed Mar 26 12:02:47 2008 -0400 Revert "ACPI: video: Ignore ACPI video devices that aren't present in hardware" Bug: #197929 This patch is reverted because it introduced regressions on working laptops. http://bugzilla.kernel.org/show_bug.cgi?id=9995 This reverts commit 50bcae78c21bf5c69d72ede68c8a00e12ab95618. Signed-off-by: Stefan Bader <stefan.bader@canonical.com> ---------- So probably the commit that fixes things for us is this one: ---------- commit 50bcae78c21bf5c69d72ede68c8a00e12ab95618 Author: Matthew Garrett <mjg59@srcf.ucam.org> Date: Thu Feb 7 01:44:06 2008 +0000 ACPI: video: Ignore ACPI video devices that aren't present in hardware Vendors often ship machines with a choice of integrated or discrete graphics, and use the same DSDT for both. As a result, the ACPI video module will locate devices that may not exist on this specific platform. Attempt to determine whether the device exists or not, and abort the device creation if it do not exist. Signed-off-by: Matthew Garrett <mjg59@srcf.ucam.org> Signed-off-by: Len Brown <len.brown@intel.com> ---------- Both commits are Ubuntu-specific. In Linus' tree the commits are: ---------- commit f0d6752c9fa51d24c86b57c76ec5b2926a716b23 Author: Len Brown <len.brown@intel.com> Date: Tue Mar 18 01:43:53 2008 -0400 Revert "ACPI: video: Ignore ACPI video devices that aren't present in hardware" This reverts commit 3fa2cdcc45a0176de15cac9dbf4ed2834ebf8932. http://bugzilla.kernel.org/show_bug.cgi?id=9995 Signed-off-by: Len Brown <len.brown> commit 3fa2cdcc45a0176de15cac9dbf4ed2834ebf8932 Author: Matthew Garrett <mjg59@srcf.ucam.org> Date: Thu Feb 7 01:44:06 2008 +0000 ACPI: video: Ignore ACPI video devices that aren't present in hardware Vendors often ship machines with a choice of integrated or discrete graphics, and use the same DSDT for both. As a result, the ACPI video module will locate devices that may not exist on this specific platform. Attempt to determine whether the device exists or not, and abort the device creation if it do not exist. Signed-off-by: Matthew Garrett <mjg59@srcf.ucam.org> Signed-off-by: Len Brown <len.brown@intel.com> ---------- Zhang, does this information help you? Can do anything else?
Hah, I see, With Mattew's patch applied, AMW0 is not loaded, AMW0._DOD is never invoked, thus the system doesn't hang. So this is still a BIOS problem to me.
Close this bug as it's a BIOS issue.
we can still workaround it in Linux kernel like Matthew did in commit 3fa2cdcc45a0176de15cac9dbf4ed2834ebf8932.
It seems that Thomas' patch can fix this problem as well. *** This bug has been marked as a duplicate of bug 9614 ***