Bug 60804
Summary: | Baytrail-M & ILK mobile: Resume from S4 causes system reboot sporadically | ||
---|---|---|---|
Product: | Power Management | Reporter: | Feng, Cancan (cancan.feng) |
Component: | Hibernation/Suspend | Assignee: | Lan Tianyu (tianyu.lan) |
Status: | CLOSED WILL_FIX_LATER | ||
Severity: | high | CC: | aaron.lu, guang.a.yang, qingshuai.tian, tianyu.lan, yangweix.shui |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 3.11.0 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
kernel loading log 1 at resume phase
kernel loading log 2 at resume phase dmesg: Baytrail-M S4 reliability test reboot |
Description
Feng, Cancan
2013-08-28 01:42:15 UTC
(In reply to Feng, Cancan from comment #0) > System Environment: > -------------------------------------------- > Kernel: (drm-intel-next-queued)30815646aadf5a45da2d6c664953acfac525e22e > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Tue Aug 20 12:56:40 2013 +0100 > > drm/i915: Don't destroy the vma placeholder during execbuffer reservation > > Bug detail Description: > -------------------------------------------- > This issue happens on ILK's mobile machine. System can suspend to disk > successfully, but will reboot while resuming from S4 sporadically. This > happens about 1 in 5 times. I tried 3.9, 3.8 and 3.6 kernel but can't find a > good commit.. Please provide the output of acpidump. Could you provide some logs? Maybe use a camera to shot the kernel log when system reboots during resuming. > > This issue also exists without i915 loaded. > > Reproduce step: > -------------------------------------------- > 1. booting up machine > 2. echo disk > /sys/power/state --> system reboot 00:00.0 Host bridge [0600]: Intel Corporation Core Processor DRAM Controller [8086:0044] (rev 02) 00:02.0 VGA compatible controller [0300]: Intel Corporation Core Processor Integrated Graphics Controller [8086:0046] (rev 02) 00:19.0 Ethernet controller [0200]: Intel Corporation 82577LM Gigabit Network Connection [8086:10ea] (rev 05) 00:1a.0 USB Controller [0c03]: Intel Corporation 5 Series/3400 Series Chipset USB2 Enhanced Host Controller [8086:3b3c] (rev 05) 00:1b.0 Audio device [0403]: Intel Corporation 5 Series/3400 Series Chipset High Definition Audio [8086:3b56] (rev 05) 00:1c.0 PCI bridge [0604]: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 1 [8086:3b42] (rev 05) 00:1c.1 PCI bridge [0604]: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 2 [8086:3b44] (rev 05) 00:1c.2 PCI bridge [0604]: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 3 [8086:3b46] (rev 05) 00:1c.3 PCI bridge [0604]: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 4 [8086:3b48] (rev 05) 00:1d.0 USB Controller [0c03]: Intel Corporation 5 Series/3400 Series Chipset USB2 Enhanced Host Controller [8086:3b34] (rev 05) 00:1e.0 PCI bridge [0604]: Intel Corporation 82801 Mobile PCI Bridge [8086:2448] (rev a5) 00:1f.0 ISA bridge [0601]: Intel Corporation Mobile 5 Series Chipset LPC Interface Controller [8086:3b07] (rev 05) 00:1f.2 RAID bus controller [0104]: Intel Corporation Mobile 82801 SATA RAID Controller [8086:282a] (rev 05) 00:1f.3 SMBus [0c05]: Intel Corporation 5 Series/3400 Series Chipset SMBus Controller [8086:3b30] (rev 05) 00:1f.6 Signal processing controller [1180]: Intel Corporation 5 Series/3400 Series Chipset Thermal Subsystem [8086:3b32] (rev 05) 02:00.0 Network controller [0280]: Broadcom Corporation BCM4313 802.11b/g/n Wireless LAN Controller [14e4:4727] (rev 01) 03:00.0 CardBus bridge [0607]: Ricoh Co Ltd Device [1180:e476] (rev 02) 03:00.1 SD Host controller [0805]: Ricoh Co Ltd MMC/SD Host Controller [1180:e822] (rev 03) 03:00.4 FireWire (IEEE 1394) [0c00]: Ricoh Co Ltd FireWire Host Controller [1180:e832] (rev 03) 3f:00.0 Host bridge [0600]: Intel Corporation Core Processor QuickPath Architecture Generic Non-core Registers [8086:2c62] (rev 02) 3f:00.1 Host bridge [0600]: Intel Corporation Core Processor QuickPath Architecture System Address Decoder [8086:2d01] (rev 02) 3f:02.0 Host bridge [0600]: Intel Corporation Core Processor QPI Link 0 [8086:2d10] (rev 02) 3f:02.1 Host bridge [0600]: Intel Corporation Core Processor QPI Physical 0 [8086:2d11] (rev 02) 3f:02.2 Host bridge [0600]: Intel Corporation Core Processor Reserved [8086:2d12] (rev 02) 3f:02.3 Host bridge [0600]: Intel Corporation Core Processor Reserved [8086:2d13] (rev 02) Hello, What is ILK mobile machine, is it a laptop? Also, please try to do some basic debugging as described in https://www.kernel.org/doc/Documentation/power/basic-pm-debugging.txt? Thanks. (In reply to Aaron Lu from comment #3) > Hello, > > What is ILK mobile machine, is it a laptop? > > Also, please try to do some basic debugging as described in > https://www.kernel.org/doc/Documentation/power/basic-pm-debugging.txt? > Thanks. Yes, it's a laptop. I did as what says in the website: 1. # echo reboot > /sys/power/disk # echo disk > /sys/power/state System will reboot at 2nd time from resuming. 2. # echo devices > /sys/power/pm_test # echo platform > /sys/power/disk # echo disk > /sys/power/state In this testing mode, I test each of these freezer,devices, platform, processors, core 5 times, but none of these five modes fails. 3. # echo shutdown > /sys/power/disk # echo disk > /sys/power/state System will reboot at 3rd time from resuming. (In reply to Lan Tianyu from comment #1) > (In reply to Feng, Cancan from comment #0) > > System Environment: > > -------------------------------------------- > > Kernel: (drm-intel-next-queued)30815646aadf5a45da2d6c664953acfac525e22e > > Author: Chris Wilson <chris@chris-wilson.co.uk> > > Date: Tue Aug 20 12:56:40 2013 +0100 > > > > drm/i915: Don't destroy the vma placeholder during execbuffer > reservation > > > > Bug detail Description: > > -------------------------------------------- > > This issue happens on ILK's mobile machine. System can suspend to disk > > successfully, but will reboot while resuming from S4 sporadically. This > > happens about 1 in 5 times. I tried 3.9, 3.8 and 3.6 kernel but can't find > a > > good commit.. > Please provide the output of acpidump. > > Could you provide some logs? Maybe use a camera to shot the kernel log when > system reboots during resuming. It's hard to take a picture, what do you think I record a video and email you? Currently, I have no good idea. So let's have a try and maybe we could find some clues. (In reply to Lan Tianyu from comment #6) > Currently, I have no good idea. So let's have a try and maybe we could find > some clues. Hmm..I made a video but it's too big to send. So I captured two photos of kernel loading phase of resume. Next second, the system will reboot. Created attachment 107341 [details]
kernel loading log 1 at resume phase
Created attachment 107342 [details]
kernel loading log 2 at resume phase
IVB: Apple MacBook Pro also have this issue. I tried to loop running S4 on this machine, It will reboot in about 10 rounds. Now, I get such machine and work on this bug. Baytrail machine is reproduceable, it will reboot by loop running S4 about 3 times. Ok. I get the Feng Cancan's machine and reinstall a fresh fedora 19. The issue occur once after 108 s4. I will prepare some debug patch into kernel since it's so hard to reproduce. (In reply to shui yangwei from comment #12) > Baytrail machine is reproduceable, it will reboot by loop running S4 about 3 > times. Byatrail should have a serial port to debug. Could you catch the log when it reboot? Created attachment 108151 [details] dmesg: Baytrail-M S4 reliability test reboot (In reply to Lan Tianyu from comment #13) > Ok. I get the Feng Cancan's machine and reinstall a fresh fedora 19. The > issue occur once after 108 s4. I will prepare some debug patch into kernel > since it's so hard to reproduce. > > (In reply to shui yangwei from comment #12) > > Baytrail machine is reproduceable, it will reboot by loop running S4 about > 3 > > times. > Byatrail should have a serial port to debug. Could you catch the log when it > reboot? OK, I append the dmesg here. Please try the following patch. diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c index 0b78f72..e292def 100644 --- a/kernel/power/hibernate.c +++ b/kernel/power/hibernate.c @@ -742,10 +742,10 @@ static int software_resume(void) if (swsusp_resume_device) goto Check_image; - if (!strlen(resume_file)) { - error = -ENOENT; - goto Unlock; - } +// if (!strlen(resume_file)) { +// error = -ENOENT; +// goto Unlock; +// } pr_debug("PM: Checking hibernation image partition %s\n", resume_file); (In reply to Lan Tianyu from comment #15) > Please try the following patch. > > diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c > index 0b78f72..e292def 100644 > --- a/kernel/power/hibernate.c > +++ b/kernel/power/hibernate.c > @@ -742,10 +742,10 @@ static int software_resume(void) > if (swsusp_resume_device) > goto Check_image; > > - if (!strlen(resume_file)) { > - error = -ENOENT; > - goto Unlock; > - } > +// if (!strlen(resume_file)) { > +// error = -ENOENT; > +// goto Unlock; > +// } > > pr_debug("PM: Checking hibernation image partition %s\n", > resume_file); I tested this patch on latest -next-queued kernel of Daniel's tree, machine isn't reboot, but it hangs there and unreachable by loop running S4 at about 22 times round. Could you attach the log ? (In reply to Lan Tianyu from comment #17) > Could you attach the log ? I reboot the machine, and I find my machine resume from S4 and continue doing the reliability test, I think it is hang at the suspend part of S4. I have save the dmesg, but I don't know why there's only a little messages in it. I paste it below: [ 788.893800] [drm:ironlake_panel_vdd_off_sync], PP_STATUS: 0xabcd000f PP_CONTROL: 0x80000008 [ 789.628751] ax88179_178a 1-6.2:1.0 enp0s20u6u2: ax88179 - Link status is: 1 [ 789.881664] hpet_rtc_timer_reinit: 7 callbacks suppressed [ 789.885830] hpet1: lost 9599 rtc interrupts [ 790.212860] hpet1: lost 9599 rtc interrupts [ 790.558977] hpet1: lost 9599 rtc interrupts [ 790.885549] hpet1: lost 9600 rtc interrupts [ 791.159335] hpet1: lost 9600 rtc interrupts [ 791.432820] hpet1: lost 9600 rtc interrupts [ 791.709419] hpet1: lost 9600 rtc interrupts [ 792.035632] hpet1: lost 9599 rtc interrupts [ 792.382003] hpet1: lost 9599 rtc interrupts [ 792.708658] hpet1: lost 9600 rtc interrupts From this log, it is hpet issue and not related with pm core's hibernation code. Further more, s4 still works after reboot. (In reply to shui yangwei from comment #18) > [ 788.893800] [drm:ironlake_panel_vdd_off_sync], PP_STATUS: 0xabcd000f > PP_CONTROL: 0x80000008 > [ 789.628751] ax88179_178a 1-6.2:1.0 enp0s20u6u2: ax88179 - Link status is: > 1 > [ 789.881664] hpet_rtc_timer_reinit: 7 callbacks suppressed > [ 789.885830] hpet1: lost 9599 rtc interrupts > [ 790.212860] hpet1: lost 9599 rtc interrupts Try unset CONFIG_HPET_EMULATE_RTC in you kernel config and see what would happen. Hi Yangwei: Could you check the machine's swap partition and test with kernel param "resume=(swap partition e.g /dev/sda3)" ? (In reply to Lan Tianyu from comment #21) > Hi Yangwei: > Could you check the machine's swap partition and test with kernel > param "resume=(swap partition e.g /dev/sda3)" ? Yeah, I tested it just like what you say, loop running S4 120 rounds, all passed. Might this kernel command really worked. This proves hibernation function works since this param is to make kernel to start hibernation resume. Original the hibernation resume is triggered by initrd. So I think the initrd was abnormal and didn't trigger hibernation resume. The reboot also is suspicious and seems not a ordinary reboot because the log doesn't show some logs of reboot. Tianyu, any updated or idea for this bug? Sorry, current have no idea about what the user space did to trigger the abnormal reboot. Since this bug is hard to root cause and this maybe triggered by Bios(Baytrail-M is still under developing and Bios is not stable), mark this bug as WILL_FIX_LATER. |