Bug 94551 - System reboots on graphical load with Intel HD 4000 iGPU since Linux 3.10.
Summary: System reboots on graphical load with Intel HD 4000 iGPU since Linux 3.10.
Status: RESOLVED UNREPRODUCIBLE
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - Intel) (show other bugs)
Hardware: Intel Linux
: P3 normal
Assignee: intel-gfx-bugs@lists.freedesktop.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-03-08 17:02 UTC by rasmus
Modified: 2015-10-20 08:24 UTC (History)
3 users (show)

See Also:
Kernel Version: 3.18
Subsystem:
Regression: No
Bisected commit-id:


Attachments
bios_settings.txt: my bios settings (3.25 KB, text/plain)
2015-03-08 17:02 UTC, rasmus
Details
dmesg_fedora_19.txt: dmesg without reboot (71.51 KB, text/plain)
2015-03-08 17:04 UTC, rasmus
Details
dmesg_fedora_21.txt: dmesg when crashing (via netconsole) (204.61 KB, text/plain)
2015-03-08 17:04 UTC, rasmus
Details
iomem_fedora_21.txt (2.94 KB, text/plain)
2015-03-08 17:05 UTC, rasmus
Details
iomem_fedora_19.txt (2.82 KB, text/plain)
2015-03-08 17:06 UTC, rasmus
Details
ioports_fedora_19.txt (1.23 KB, text/plain)
2015-03-08 17:06 UTC, rasmus
Details
ioports_fedora_21.txt (1.28 KB, text/plain)
2015-03-08 17:07 UTC, rasmus
Details
lspci_fedora_19.txt (30.98 KB, text/plain)
2015-03-08 17:07 UTC, rasmus
Details
lspci_fedora_21.txt (31.10 KB, text/plain)
2015-03-08 17:08 UTC, rasmus
Details
modules_fedora_19.txt (5.12 KB, text/plain)
2015-03-08 17:08 UTC, rasmus
Details
modules_fedora_21.txt (6.05 KB, text/plain)
2015-03-08 17:09 UTC, rasmus
Details
ver_linux_fedora_19.txt (9.73 KB, text/plain)
2015-03-08 17:09 UTC, rasmus
Details
ver_linux_fedora_21.txt (10.14 KB, text/plain)
2015-03-08 17:10 UTC, rasmus
Details
xorg_crash_fedora_21.txt: An example of an Xorg log when crashing. (41.71 KB, text/plain)
2015-03-08 17:10 UTC, rasmus
Details

Description rasmus 2015-03-08 17:02:15 UTC
Created attachment 169731 [details]
bios_settings.txt: my bios settings

1 Full description of the problem/report
════════════════════════════════════════

  I experience reboots crashing on my Thinkpad W530 with HD4000 whenever
  the iGPU is exposed to moderate load, e.g. playing a video game or
  movies.  This only happens when I employ the Intel iGPU.  It happens
  irrespective of whether Nvidia Optimus in enabled.

  It seems this error was introduced between Linux 3.9 and 3.10.  Fedora
  19 is stable.  Fedora 20 is not.  I have tested 3.9 with the current
  Xorg stack on Archlinux and it is also stable.  3.10 is not.  Windows
  7 is also stable.

  It’s not a hardware issue!  I can run mprime cpu load indefinitely on
  the system without a crash.  The temperature never goes above 90
  degrees when I run any test.  This problem is confirmed on two W530
  machines.

  It does not seem to be a Intel driver bug as I the current Xorg stack
  works with Linux 3.9.  A bug was first filled against Intel drivers,
  see section [#links].

  Logs are produced with clean Fedora images.


  [#links] See section 7.2


2 Keywords
══════════

  hard crash, Kernel, Thinkpad, Intel graphics, iGPU


3 Kernel version (from /proc/version)
═════════════════════════════════════

3.1 Fedora 21
─────────────

  Linux version 3.18.7-200.fc21.x86_64
  (mockbuild@bkernel02.phx2.fedoraproject.org) (gcc version 4.9.2
  20141101 (Red Hat 4.9.2-1) (GCC) ) #1 SMP Wed Feb 11 21:53:17 UTC 2015


3.2 Fedora 19
─────────────

  Linux version 3.9.5-301.fc19.x86_64
  (mockbuild@bkernel01.phx2.fedoraproject.org) (gcc version 4.8.1
  20130603 (Red Hat 4.8.1-1) (GCC) ) #1 SMP Tue Jun 11 19:39:38 UTC 2013


4 Output of Oops.. message (if applicable) with symbolic information
════════════════════════════════════════════════════════════════════

  I don’t see Oops messages…  Please see the enclosed files.


5 How to reproduce
══════════════════

  1. Download gputest from [http://www.geeks3d.com/gputest/]
  2. Run start_furmark_windowed_1024x640.sh.  Put it in full screen if
     you like.
  3. On my system the computer crashes and reboots typically within 10
     minutes.
     The reboot is as if power was cut and returned.  Like if the CPU
     was
     overheating (which it is not).  IOW: The systemctl shutdown logs
     are not
     displayed.


6 Environment
═════════════

  Thinkpad W530?  (I don’t understand this question).


6.1 Software (add the output of the ver_linux script here)
──────────────────────────────────────────────────────────

6.1.1 Fedora 21
╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌

  Please see ver_linux_fedora_21.txt.


6.1.2 Fedora 19
╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌

  Please see ver_linux_fedora_19.txt.


6.2 Processor information (from /proc/cpuinfo)
──────────────────────────────────────────────

6.2.1 Fedora 21
╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌

  I believe this is part of ver_linux_fedora_21.txt.


6.2.2 Fedora 19
╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌

  I believe this is part of ver_linux_fedora_19.txt.


6.3 Module information (from /proc/modules)
───────────────────────────────────────────

6.3.1 Fedora 21
╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌

  Please see modules_fedora_21.txt


6.3.2 Fedora 19
╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌

  Please see modules_fedora_19.txt


6.4 Loaded driver and hardware information (/proc/ioports, /proc/iomem)
───────────────────────────────────────────────────────────────────────

6.4.1 Fedora 21
╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌

6.4.1.1 /proc/ioports
┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄

  Please see ioports_fedora_21.txt.


6.4.1.2 /proc/iomem
┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄

  Please see iomem_fedora_21.txt.


6.4.2 Fedora 19
╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌

6.4.2.1 /proc/ioports
┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄

  Please see ioports_fedora_19.txt.


6.4.2.2 /proc/iomem
┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄

  Please see iomem_fedora_19.txt.


6.5 PCI information (‘lspci -vvv’ as root)
──────────────────────────────────────────

6.5.1 Fedora 21
╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌

  Please see lspci_fedora_21.txt


6.5.2 Fedora 19
╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌

  Please see lspci_fedora_19.txt


6.6 SCSI information (from /proc/scsi/scsi)
───────────────────────────────────────────

6.6.1 Fedora 21
╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌

  Attached devices:
  Host: scsi0 Channel: 00 Id: 00 Lun: 00
    Vendor: ATA Model: Crucial_CT256MX1 Rev: MU01
    Type: Direct-Access ANSI SCSI revision: 05
  Host: scsi1 Channel: 00 Id: 00 Lun: 00
    Vendor: MATSHITA Model: DVD-RAM UJ8C0 Rev: SB01
    Type: CD-ROM ANSI SCSI revision: 05


6.6.2 Fedora 19
╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌

  Attached devices:
  Host: scsi0 Channel: 00 Id: 00 Lun: 00
    Vendor: ATA Model: Crucial_CT256MX1 Rev: MU01
    Type: Direct-Access ANSI SCSI revision: 05
  Host: scsi1 Channel: 00 Id: 00 Lun: 00
    Vendor: MATSHITA Model: DVD-RAM UJ8C0 Rev: SB01
    Type: CD-ROM ANSI SCSI revision: 05
  Host: scsi6 Channel: 00 Id: 00 Lun: 00
    Vendor: Kingston Model: DataTraveler 3.0 Rev: PMAP
    Type: Direct-Access ANSI SCSI revision: 06


6.7 Other information that might be relevant to the problem
───────────────────────────────────────────────────────────

  Not that I am aware of.


7 Other notes, patches, fixes, workarounds
══════════════════════════════════════════

7.1 Other enclosures
────────────────────

  dmesg_fedora_21.txt: netconsole output of Fedora 21 with crash.
  dmesg_fedora_19.txt: dmesg after running Furmark for a bit in
                       Fedora 19.  (My understanding is that this should
                       be similar to
                       the netcat output)
  xorg_crash_fedora_21.txt: an example of an Xorg-log when the server
                            is crashing.
  bios_settings.txt: settings in the bios with some notes.


7.2 Other iteration of this bug report
──────────────────────────────────────

  This bug was first reported here:
  [https://bugs.freedesktop.org/show_bug.cgi?id=89451]

  It has been discussed here:
  [http://forum.thinkpads.com/viewtopic.php?f=70&t=116472]
Comment 1 rasmus 2015-03-08 17:04:05 UTC
Created attachment 169741 [details]
dmesg_fedora_19.txt: dmesg without reboot
Comment 2 rasmus 2015-03-08 17:04:55 UTC
Created attachment 169751 [details]
dmesg_fedora_21.txt: dmesg when crashing (via netconsole)
Comment 3 rasmus 2015-03-08 17:05:52 UTC
Created attachment 169761 [details]
iomem_fedora_21.txt
Comment 4 rasmus 2015-03-08 17:06:19 UTC
Created attachment 169771 [details]
iomem_fedora_19.txt
Comment 5 rasmus 2015-03-08 17:06:47 UTC
Created attachment 169781 [details]
ioports_fedora_19.txt
Comment 6 rasmus 2015-03-08 17:07:16 UTC
Created attachment 169791 [details]
ioports_fedora_21.txt
Comment 7 rasmus 2015-03-08 17:07:49 UTC
Created attachment 169801 [details]
lspci_fedora_19.txt
Comment 8 rasmus 2015-03-08 17:08:11 UTC
Created attachment 169811 [details]
lspci_fedora_21.txt
Comment 9 rasmus 2015-03-08 17:08:38 UTC
Created attachment 169821 [details]
modules_fedora_19.txt
Comment 10 rasmus 2015-03-08 17:09:06 UTC
Created attachment 169831 [details]
modules_fedora_21.txt
Comment 11 rasmus 2015-03-08 17:09:42 UTC
Created attachment 169841 [details]
ver_linux_fedora_19.txt
Comment 12 rasmus 2015-03-08 17:10:05 UTC
Created attachment 169851 [details]
ver_linux_fedora_21.txt
Comment 13 rasmus 2015-03-08 17:10:41 UTC
Created attachment 169861 [details]
xorg_crash_fedora_21.txt: An example of an Xorg log when crashing.
Comment 14 rasmus 2015-03-08 17:14:13 UTC
BTW: I don't know if this bug is filed wrongly; I'm only a user of the Linux kernel.

Please let me know if there is any more information I can add to improve the bug report.
Comment 15 Zhang Rui 2015-03-10 00:52:27 UTC
can you please check if the problem still exists if you
1. add "nomodeset" kernel option
or
2. disable the graphics driver by setting CONFIG_DRM_I915=n?
Comment 16 rasmus 2015-03-10 08:16:19 UTC
> 1. add "nomodeset" kernel option

It seems completly stable.   I've run with nomodeset for six to seven hours without any problems.
Comment 17 rasmus 2015-03-10 08:18:35 UTC
> I've run with nomodeset for six to seven hours without any problems.
         ^^^
          it
Comment 18 Zhang Rui 2015-03-10 08:29:04 UTC
This seems to be a i915 issue to me. re-assign to the graphics experts.
Comment 19 rasmus 2015-03-10 08:36:53 UTC
Sounds fair.  But: I initially reported the bug on Freedesktop/Intel DRM.  I don't know if it's the same people (same general mailing list, though), but Chris Wilson (of Intel) said:

> Your dmesg does not show a controlled shutdown. A GPU hang, even a lowlevel
> hardware hang, should not result in the machine rebooting. You dmesg does
> show that the kernel disagrees with the ACPI firmware implementation and that
> its actively managing the thermal throttling. At this point, your best bet is
> to bisect the kernel and see where that leads.
Comment 20 Zhang Rui 2015-03-10 08:52:24 UTC
Yes, I saw that.
There are indeed some ACPI related warning messages.
[ 2355.912119] ACPI Warning: \_SB_.PCI0.PEG_.VID_._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20140926/nsarguments-95)
[ 2355.912786] ACPI Warning: \_SB_.PCI0.PEG_.VID_._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20140926/nsarguments-95)
[ 2357.181414] thinkpad_acpi: EC reports that Thermal Table has changed

But the ACPI warning is generated when graphics _DSM is invoked, and the thinkpad_acpi one is sounds like a platform driver issue.
Thus I don't see what we Linux/ACPI can do for this issue.

IMO, nomodeset makes the problem disappear is sufficient to show that this is a graphics issue, but to double check if it is a thermal issue, you can build your kernel with CONFIG_THERMAL=n and boot without nomodeset parameter and check if the problem still exists.
Comment 21 rasmus 2015-03-10 09:05:06 UTC
[Sorry about the malformed quotes above].

> thinkpad_acpi one is sounds like a platform driver issue.

I tried to install tp_smapi.  No change.  I think I tried to blacklist thinkpad_acpi with no change, but I'm not 100% sure on that one.

>  Thus I don't see what we Linux/ACPI can do for this issue.

OK.  Fair enough.

> to double check if it is a thermal issue

I can do that.  It's currently bisecting between 3.9 and 3.10.
From eyeballing, the temperature was exactly the same with nomodeset
and without it, but I test this better.
Comment 22 Jani Nikula 2015-03-10 10:26:23 UTC
(In reply to rasmus from comment #21)
> I tried to install tp_smapi.  No change.  I think I tried to blacklist
> thinkpad_acpi with no change, but I'm not 100% sure on that one.

Please do try to double check the thinkpad_acpi (and possibly other thinkpad_*) platform driver.
Comment 23 rasmus 2015-03-10 10:32:58 UTC
> Please do try to double check the thinkpad_acpi

Is it sufficient to blacklist thinkpad_acpi (i.e. after init)?  Or do I need to somehow remove it at compile time?

>  (and possibly other thinkpad_*) platform driver.

Are there any other in ther kernel?  On my X200s (which is what I have here) lsmod | grep -i think only gives me thinkpad_acpi.

Thanks,
Rasmus
Comment 24 Jani Nikula 2015-03-10 14:01:49 UTC
(In reply to rasmus from comment #23)
> Is it sufficient to blacklist thinkpad_acpi (i.e. after init)?  Or do I need
> to somehow remove it at compile time?

That should be enough, unless is built-in.

> Are there any other in ther kernel?  On my X200s (which is what I have here)
> lsmod | grep -i think only gives me thinkpad_acpi.

Probably not, I was thinking some other laptop.
Comment 25 rasmus 2015-03-11 08:55:36 UTC
> Please do try to double check the thinkpad_acpi platform driver.

The bug happens irrespective of whether thinkpad_acpi is enabled or not on 3.18 (the default Fedora 21 kernel).

My bisect has not revealed anything yet.
Comment 26 rasmus 2015-03-11 09:35:41 UTC
(In reply to Zhang Rui from comment #20)

> but to double check if it is a thermal issue, you can
> build your kernel with CONFIG_THERMAL=n and boot without nomodeset parameter
> and check if the problem still exists.

The computer still crashes.  I don't know what that means though.  I guess with CONFIG_THERMAL the bios 'heuristics' for temperature are followed.
Comment 27 Jani Nikula 2015-08-18 13:42:54 UTC
In this case I don't think it's conclusive that nomodeset working makes it an i915 bug. If it's a thermal shutdown merely using vs. not using the gpu might make the difference.

That said, is this still an issue with current upstream kernels?
Comment 28 rasmus 2015-08-23 15:23:28 UTC
Hi,


> In this case I don't think it's conclusive that nomodeset working makes it an
> i915 bug. If it's a thermal shutdown merely using vs. not using the gpu might
> make the difference.

But then why doesn't in happen on Windows?

> That said, is this still an issue with current upstream kernels?

It's a lot better now.  I did experience one crash like this recently,
though, but it's seems to be a lot harder to trigger.

But I think some of this bug remains.

Rasmus
Comment 29 Jani Nikula 2015-10-20 08:24:27 UTC
(In reply to rasmus from comment #28)
> It's a lot better now.  I did experience one crash like this recently,
> though, but it's seems to be a lot harder to trigger.

That's good...

> But I think some of this bug remains.

...while that's not so good. However, we don't see this in our testing, and I'm not sure what we could do about this. I'm closing this as unreproducible.

However, if the problem persists and keeps annoying you, please file a new bug at [1] where we're migrating all of the graphics bugs. Thanks.

[1] https://bugs.freedesktop.org/enter_bug.cgi?product=DRI&component=DRM/Intel

Note You need to log in before you can comment on or make changes to this bug.