Bug 110821 - [HSW] Complete Lockup on Battery Power
Summary: [HSW] Complete Lockup on Battery Power
Status: RESOLVED INVALID
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - Intel) (show other bugs)
Hardware: Intel Linux
: P1 high
Assignee: Rafael J. Wysocki
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-01-14 22:58 UTC by Luca Di Maio
Modified: 2016-01-18 11:36 UTC (History)
2 users (show)

See Also:
Kernel Version: 4.3.3-300+
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
PC Specs (lshw) (23.35 KB, text/plain)
2016-01-14 22:58 UTC, Luca Di Maio
Details
Dmesg (1) (580.00 KB, text/plain)
2016-01-14 23:00 UTC, Luca Di Maio
Details
Dmesg (2) (916.00 KB, text/plain)
2016-01-14 23:00 UTC, Luca Di Maio
Details
Journalctl (492.00 KB, text/plain)
2016-01-14 23:01 UTC, Luca Di Maio
Details
Turbostat (132.00 KB, text/plain)
2016-01-17 09:55 UTC, Luca Di Maio
Details
Journalctl (cleared) (370.82 KB, text/plain)
2016-01-17 09:56 UTC, Luca Di Maio
Details

Description Luca Di Maio 2016-01-14 22:58:46 UTC
Created attachment 199651 [details]
PC Specs (lshw)

I've recently upgraded from Linux 4.2.3 to 4.3.3 on my machine (specs in attachment).
With the newest kernel, the machine becomes completely unresponsive after a while, in diverse conditions and workloads. Only hard reset works.
The bug is easily reproducible by launching 'powertop —auto-tune' in case it does not trigger on its own during normal use.
I have no problems whatsoever on 4.2.3, it's very stable and delivers great performance.
I encountered this problem both on Fedora 23 and Arch Linux using the respective stock 4.3.x kernels, hence why I decided to tag it here on mainline.

Regarding the discrete GPU, I'm using nouveau on Fedora and Nvidia's proprietary drivers plus bbswitch on Arch, while keeping it powered down in both cases, so I don't think it has any influence on the problem.

I'm also attaching 2 dmesg logs and a complete journalctl that seem to stop when the complete lockup happens, taken on Fedora.
And in addition to that, there seems to be nothing strange /sys/class/drm/card0/error (no error state collected).
Comment 1 Luca Di Maio 2016-01-14 23:00:05 UTC
Created attachment 199661 [details]
Dmesg (1)
Comment 2 Luca Di Maio 2016-01-14 23:00:38 UTC
Created attachment 199671 [details]
Dmesg (2)
Comment 3 Luca Di Maio 2016-01-14 23:01:05 UTC
Created attachment 199681 [details]
Journalctl
Comment 4 Luca Di Maio 2016-01-17 09:55:03 UTC
Created attachment 200091 [details]
Turbostat

I'm attaching a turbostat log
It has been trucked in the moment of the freeze.
I can't see differences between this one and the one from 4.2.3 but it may be useful
Comment 5 Luca Di Maio 2016-01-17 09:56:46 UTC
Created attachment 200101 [details]
Journalctl (cleared)

Another journal but this time i've sed out all the WRITE and REAT blocks report, it may be more readable this way
Comment 6 Luca Di Maio 2016-01-17 10:00:18 UTC
The problem does not seem to occur with a specific workload, it seems almost random.
I fail to see where the problem can be, if anyone needs more testing/logs or other type of logs, i'll be more than happy to test.
Comment 7 Luca Di Maio 2016-01-17 11:42:32 UTC
Tested and I can also confirm this behavior on 4.4-mainline on fedora 23!
Comment 8 Luca Di Maio 2016-01-17 20:03:10 UTC
UPDATE:
Moved to DRI/Intel and lowered to high importance because it seems connected to the flag:

i915.enable_fbc=1

It's strange that on 4.2.3 works wonderfully and on 4.3.3+ freezes completely. But I can use the PC without it.

Output of "tail /sys/kernel/debug/dri/0/i915_capabilities"

has_fbc: yes
has_pipe_cxsr: no
has_hotplug: yes
cursor_needs_physical: no
has_overlay: no
overlay_needs_physical: no
supports_tv: no
has_llc: yes
has_ddi: yes
has_fpga_dbg: yes

so Frame buffer Compression is supported on the HW.

Still  /sys/class/drm/card0/error shows no error after a Lockup.

I still leave it to "High" because it is an important feature to add power saving to the laptop.

Without it, my turbostat says that PC3 is only about 20-25% of time (and its is max value), with FBC it stays on PC3 85/90% of the time (can't go deeper than pc3 sadly)

It translates to about 3-4w higher power consumpion in idle/light workload. It seems to not affect high workload.

If I can do more to help debugging I will do it gladly!
Comment 9 Jani Nikula 2016-01-18 09:16:39 UTC
(In reply to Luca Di Maio from comment #8)
> UPDATE:
> Moved to DRI/Intel and lowered to high importance because it seems connected
> to the flag:
> 
> i915.enable_fbc=1

Specifying that module parameter should taint the kernel. If FBC is not enabled by default, it's not supported by the driver for your hardware.
Comment 10 Luca Di Maio 2016-01-18 09:19:01 UTC
Yes I've now disabled it and seems that freeze are gone, but it may be useful to debug because up to 4.2.3 (from 3.18) always worked perfectly!

If I can help to debug, I'll be happy to do!

Thanks for the answer!

Note You need to log in before you can comment on or make changes to this bug.