Bug 42212
Created attachment 71232 [details]
Log after suspending with 3.0.3
Created attachment 71242 [details]
Log after suspending with 3.0.4 (missing the suspend part, was not sync)
Created attachment 71252 [details]
pm-suspend-3.0.4.log
Created attachment 71262 [details]
pm-suspend-3.0.3.log
Created attachment 71272 [details]
Log after suspending with 3.0.3
Linux benjarobin-asus 3.0-ARCH #1 SMP PREEMPT Wed Aug 17 20:24:07 UTC 2011 i686 Genuine Intel(R) CPU U7300 @ 1.30GHz GenuineIntel GNU/Linux Nvidia card shut down (I have tried with and without the module that disable the graphic card) I have try with and without Xorg started (Intel with KMS) Any chance you can run 'git bisect' on the kernel to determine exactly which patch caused the problem between 3.0.3 and 3.0.4? I am not going to compile the kernel with this computer (the processor is too slow and the Internet speed connexion is quite bad where I am). However I will be able to compile the patched kernel with my PC with an core i7 connected to an high speed Internet connexion, but not before the 5-6 Sept. I managed to run the compilation through ssh on my main PC. I could not believe the result of the bisect, the commit that produce the regression doesn't look like related to my problem : commit 512228f0be3af44bf5cf6cc5750ddd279bbedaf3 Author: Andi Kleen <ak@linux.intel.com> Date: Fri Aug 19 16:15:10 2011 -0700 Add a personality to report 2.6.x version numbers I have attached the steps of the bisect, I compiled twice and check that I was using the right kernel Created attachment 71612 [details]
bisect from kernel version 3.0.3 to 3.0.4
Does reverting that patch help? Do you have that personality activated (i.e. what config are you using)? Reverting that patch fix the problem. What I have done : I use the configuration files of ArchLinux: http://projects.archlinux.org/svntogit/packages.git/tree/repos/core-i686?h=packages/linux and build using the PKGBUILD script. I just add one line inside the PKGBUILD file after the line that apply : ftp://ftp.kernel.org/pub/linux/kernel/v3.0/patch-3.0.4.gz patch -Rp1 -i "${srcdir}/personality-report-2.6.patch" My Kernel command line: root=/dev/disk/by-uuid/e77bc280-de5d-464d-8790-05b5f1598505 ro nouveau.modeset=0 quiet Linux benjarobin-asus 3.0-ARCH #1 SMP PREEMPT Sun Sep 4 20:12:39 CEST 2011 i686 Genuine Intel(R) CPU U7300 @ 1.30GHz GenuineIntel GNU/Linux Created attachment 71692 [details]
Configuration of the running kernel 3.0.4
Created attachment 71702 [details]
Test patch that replace the personality modification
Even more interesting, if we don't apply the patch of "Add a personality to report 2.6.x version numbers", but instead the patch attached (We don't do any modification to personality.h, but we do minor modification to sys.c) ==>> this is still failing...
I can only see 2 possibilities : "name" or "current" variable is NULL or an invalid pointer
Created attachment 71712 [details]
Test2 patch that replace the personality modification
New test, This one is slightly different from the previous one : We are not using current->personality ==>> Test pass with success
So maybe current is NULL, how can I check that ? Something like :
if(current == NULL) {
printk(KERN_CRIT "current->personality NULL\n");
return 0;
}
Hmm, but newuname uses personality already, at least on 64bit kernels Also you should see an oops if a NULL pointer is followed. And I don't see how the low level suspend code should be calling uname anyways. This doesn't make much sense. Just to be sure when you do the suspend on the unpatched kernel multiple times in a row does it always fail? Yes, I have tested a lot of time (> 20) and always fail to resume. I run another test and I am disappointed, it's failing to resume with this function : static int override_release(char __user *release, int len) { int ret = 0; printk(KERN_WARNING "*** Test 'current' value %d ***\n", current); if(current == NULL) { printk(KERN_CRIT "*** Test 'current' is NULL ***\n"); return 0; } return ret; } Looks like if we are reading the value of current, it's just fail when resuming Does the problem still exist in the latest upstream kernel? It looks like the problem doesn't exist anymore, BUT maybe the problem (bug) is just hidden : I did have some problem after suspend with the command swapoff (around kernel 3.1), but since kernel 3.1.6 I didn't notice any problem after a suspend :-) I add this comment, first to request a reopen of this ticket and to explain what I did discovered with kernel 3.1 few month ago: After upgrading to linux kernel 3.1, the suspend is working properly with this kernel version compiled using these ArchLinux script: http://projects.archlinux.org/svntogit/packages.git/tree/trunk?h=packages/linux&id=20e846c85c47e5593afb5d67d5fb8fc6907d727e I am able to go to sleep and wake-up without any problem. After a fresh start (without suspending) I can run swapoff -a without any problem. But if the PC go to sleep with pm-suspend, after the wake-up, if I ran swapoff -a, the kernel is logging a non fatal error. (The kernel log will be attached as 'kernel_trace_3.1.log'). *************************************** Today with kernel 3.2 I noticed some random kernel error only after suspending, but it looks like it's still related to swap (The kernel log will be attached as 'kernel_trace_3.2.log'). Created attachment 72358 [details]
kernel 3.1 trace
Created attachment 72359 [details]
kernel 3.2 trace
Please try 3.6 when it comes out - this has some relevant fixes ping Hi everybody, I did run new test from a fresh install of ArchLinux with just the strict minimum of package (using systemd) on my Asus UL80VT : This computer has 2 graphics cards (intel + nvidia) The running kernel is : Linux ben 3.6.7-1-ARCH #1 SMP PREEMPT Mon Nov 19 09:11:44 CET 2012 i686 GNU/Linux First I tried by not blacklisting any kernel module, then I tried without KMS, i915, nouveau... => Same result in both case and same error in the log : After pressing the keyboard, the PC try to resume from suspend : The computer screen is black, no backlight without KMS, but with intel KMS backlight is on (still not displaying anything)... And it looks like the kernel is alive but there are trace / error in the log... Log and details will be attached. Benjamin Created attachment 87631 [details]
The log, start when booting...
Created attachment 87641 [details]
The kernel config (3.6.7)
Created attachment 87651 [details]
The modules loaded and blacklisted
Looks like the kernel resumed fine, when restarting user space tasks, systemd crashed the kernel. Does this problem still exist in latest upstream kernel? Thanks. Hi benjarobin, Any update? (In reply to comment #30) > Hi benjarobin, > > Any update? Hi, Well depending of the kernel version I can "resume" or not. In the best case scenario likes with kernel 3.7.*, I can use a little bit the already started application. If I try to start a new application it may failed with a nice kernel error (No kernel panic). And if I am lucky I can shutdown properly the PC. But in the worst case, likes with 3.8.4-1-ARCH, I just have a black screen (failed to resume ?), nothing in the log. Each time the tests were done with and without KMS, intel, nouveau modules... I really wants to help, but for that I need some advice how to debug it (I am a software developer and working with embedded device). I think I can start with a kernel version where the resume "works" (I can have a working tty), but I don't know what to do with the kernel error... Thanks. Created attachment 97091 [details]
The kernel log after a resume (3.7.9)
As you can see there is a "NULL pointer dereference" inside the kswapd0 kernel process. But I am sure that the bug is not inside the kswapd0 since after each kernel upgrade the problem is trig by something else.
Please boot with init=/bin/sh, and then suspend/resume, see if this works, thanks. Created attachment 97231 [details]
dmesg: init=/usr/bin/bash -> suspend -> resume
I did new tests as you ask, by booting with init=/usr/bin/bash. I did use pm-suspend...
If I am not loading the intel (i915) driver the backlight do not "resume", still off after resume. To get the log I did run this command :
$ sync; pm-suspend ; sleep 2 ; dmesg &> /test.log ; sync
The associated log with this command is attached : Same error that the previous log "97091: The kernel log after a resume (3.7.9)"
If I am loading the intel (i915) driver, the backlight is on after resume, and I still have the exact same error.
Hi Benjarobin, Thanks for the test. I don't have any idea what happened here, perhaps you can try to bisect the problem. When doing bisect, I suggest you always use init=/usr/bin/bash, and you can start by finding which kernel can resume I think. BTW, I think you can follow Documentation/power/basic-pm-debugging.txt to identify what is the problem. Start from freezer like this: # cd /sys/power # echo freezer > pm_test # echo mem > state And proceed to the next level until some problem occurs, this may give us some hints. Thanks. Created attachment 97891 [details]
The 5 tests of basic PM debugging
Hi,
First of all, thanks a lot for your support.
For the bisect, I already did it (in 2011, take a look at the history of this bug report), and only found a non relevant commit...
Maybe this is a hardware issue, but why I can only see it with a linux 32 bits and after a suspend. I do not have any problem or errors in the log if I am not using suspend...
I did follow the documentation, but each test were successful... The last one "core" is working without any problem.
I attached the log of these 5 tests.
I did try to use s2ram, or just "echo none > pm_test; echo mem > state" => same result and same error that previous log "97091: The kernel log after a resume (3.7.9)"
I did try to update the kernel to the latest release (3.8.6), and I did have a kernel panic (numlock flashing) during resume : backlight off so I cannot see the trace or log it to the disk.
Maybe I can setup a UDP console, I have no idea if this is possible or this is too early to obtain something...
(In reply to comment #37) > Created an attachment (id=97891) [details] > The 5 tests of basic PM debugging > > Hi, > > First of all, thanks a lot for your support. > > For the bisect, I already did it (in 2011, take a look at the history of this > bug report), and only found a non relevant commit... Yes I saw that, but it looks to me you are having a different problem now along the time... > Maybe this is a hardware issue, but why I can only see it with a linux 32 > bits > and after a suspend. I do not have any problem or errors in the log if I am > not > using suspend... This sounds relevant. > > I did follow the documentation, but each test were successful... The last one > "core" is working without any problem. > I attached the log of these 5 tests. > > I did try to use s2ram, or just "echo none > pm_test; echo mem > state" => > same > result and same error that previous log "97091: The kernel log after a resume > (3.7.9)" > > I did try to update the kernel to the latest release (3.8.6), and I did have > a > kernel panic (numlock flashing) during resume : backlight off so I cannot see > the trace or log it to the disk. > Maybe I can setup a UDP console, I have no idea if this is possible or this > is > too early to obtain something... It feels like some memory is corrupted after a power cycle due to suspend. Can you try removing some memory and re-test? I am deeply sorry... This is a hardware issue, I was hopping since the beginning that was a software bug, that is only hitting Linux 32 bits. Windows and Linux x86_64 are working fine. Why I didn't run these test before (1.5 years...) ? - Tried with only 2 GB of RAM (half of them) and this is working in the 2 cases : with only the first ram bar, and with only the second - By default the Asus UL80VT is overclocked from 1300 MHz to 1733 MHz, if I disabled the turbo option from Windows (No option is BIOS : need to contact the support for that, I can only increase the frequency of 5%) suspend is working fine, no error in the kernel log... Sorry again for the time wasted to help me and thanks again for your support. But may I ask one more question. What it is your advice : Trying with another ram bar ? Or just losing 30% of performance if I want to use suspend, but for that I need to boot under Windows... Or maybe you have a better advice ? Regards, Benjamin Robin (In reply to comment #39) > But may I ask one more question. > What it is your advice : Trying with another ram bar ? Or just losing 30% of > performance if I want to use suspend, but for that I need to boot under > Windows... > Or maybe you have a better advice ? Why not use the 64bits Linux? It's so common today that running 32bits feel outdated :-) |
Created attachment 71212 [details] lspci -vv Regression from Kernel Version 3.0.3. I cannot resume the box after successful suspend with 3.0.4 with the following configuration: