Bug 177701

Summary: warning in intel_dp_aux_transfer
Product: Platform Specific/Hardware Reporter: Mihai Donțu (mihai.dontu)
Component: x86-64Assignee: platform_x86_64 (platform_x86_64)
Status: CLOSED MOVED    
Severity: normal CC: Martin, mihai.dontu, regressions, ziegler
Priority: P1    
Hardware: Intel   
OS: Linux   
URL: https://bugs.freedesktop.org/show_bug.cgi?id=97344
Kernel Version: 4.9-rc1 Subsystem:
Regression: No Bisected commit-id:
Attachments: 4.9-rc1 dmesg
4.9-rc1 konsole screenshot

Description Mihai Donțu 2016-10-16 05:11:52 UTC
Created attachment 241791 [details]
4.9-rc1 dmesg

I'm getting this with 4.9-rc1:

[    3.750587] ------------[ cut here ]------------
[    3.750592] WARNING: CPU: 0 PID: 4 at drivers/gpu/drm/i915/intel_dp.c:1062 intel_dp_aux_transfer+0x1ed/0x230
[    3.750593] WARN_ON(!msg->buffer != !msg->size)
[    3.750593] Modules linked in:
[    3.750595] CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 4.9.0-rc1 #3
[    3.750596] Hardware name: Dell Inc. Latitude E7440/07F3F4, BIOS A18 04/28/2016
[    3.750599] Workqueue: events i915_hotplug_work_func
[    3.750601]  ffffba1bc18cbb30 ffffffff935817ca ffffba1bc18cbb80 0000000000000000
[    3.750602]  ffffba1bc18cbb70 ffffffff9314f5bb 00000426c7852940 ffffba1bc18cbc58
[    3.750603]  ffff95a7cc3d40e0 0000000000000003 00000000fffffffb 0000000000000000
[    3.750612] Call Trace:
[    3.750615]  [<ffffffff935817ca>] dump_stack+0x4d/0x63
[    3.750616]  [<ffffffff9314f5bb>] __warn+0xcb/0xf0
[    3.750617]  [<ffffffff9314f63f>] warn_slowpath_fmt+0x5f/0x80
[    3.750619]  [<ffffffff9379a16e>] ? intel_dp_aux_transfer+0xde/0x230
[    3.750620]  [<ffffffff9379a27d>] intel_dp_aux_transfer+0x1ed/0x230
[    3.750622]  [<ffffffff936b2e12>] drm_dp_dpcd_access+0x72/0x110
[    3.750624]  [<ffffffff936b2ecb>] drm_dp_dpcd_write+0x1b/0x20
[    3.750625]  [<ffffffff937957fb>] intel_dp_start_link_train+0x2cb/0x4c0
[    3.750626]  [<ffffffff93796db9>] intel_dp_check_link_status+0xd9/0x110
[    3.750627]  [<ffffffff9379b5ab>] intel_dp_long_pulse+0x40b/0xb10
[    3.750628]  [<ffffffff9379bd55>] intel_dp_detect+0xa5/0xb0
[    3.750629]  [<ffffffff9378455e>] i915_hotplug_work_func+0x1de/0x2b0
[    3.750631]  [<ffffffff93168f65>] process_one_work+0x1e5/0x470
[    3.750632]  [<ffffffff93169238>] worker_thread+0x48/0x4e0
[    3.750633]  [<ffffffff931691f0>] ? process_one_work+0x470/0x470
[    3.750634]  [<ffffffff9316f009>] kthread+0xd9/0xf0
[    3.750635]  [<ffffffff9316ef30>] ? kthread_park+0x60/0x60
[    3.750637]  [<ffffffff93f177c2>] ret_from_fork+0x22/0x30
[    3.750638] ---[ end trace 5d7bcc76a447def3 ]---
[    3.751373] ------------[ cut here ]------------

There's a visual effect too (see screenshot).
Comment 1 Mihai Donțu 2016-10-16 05:13:02 UTC
Created attachment 241801 [details]
4.9-rc1 konsole screenshot
Comment 3 The Linux kernel's regression tracker (Thorsten Leemhuis) 2016-10-30 11:34:50 UTC
JFYI: I added this report to the list of regressions for Linux 4.9. I'll watch this thread for further updates on this issue to document progress in my weekly reports. Please let me know via regressions@leemhuis.info in case the discussion moves to a different place (bugzilla or another mail thread for example). tia!

Jani: what's the status here: The patches in the linked bug report seem to be a few weeks old already and there doesn't seem to be any progress.

Ciao, Thorsten
Comment 4 Martin Steigerwald 2016-11-06 17:52:49 UTC

mentioned in the other bug report and the following LKML thread does not fix the issue for me:

Re: [REGRESSION] Linux 4.9-rc4: gfx glitches on Intel Sandybridge (was: Re: Linux 4.9-rc4)

Comment 5 Jani Nikula 2016-11-07 16:48:02 UTC
This bug conflates *two* issues: the WARNING in comment #0 and the display corruption in comment #1. They are most likely unrelated.

This bug is about the warning, per both the bug title and first description being solely about it. The warning is a dupe of https://bugs.freedesktop.org/show_bug.cgi?id=97344.

This bug is CLOSED MOVED. Track the warning at the fdo bug.

Please do not add any information about any display corruption to this bug. Please do not track this bug for any display corruption issues. If there isn't a bug about it, file a new bug at the fdo bugzilla for that.

Thank you.
Comment 6 Jani Nikula 2016-11-07 16:48:39 UTC
Comment on attachment 241801 [details]
4.9-rc1 konsole screenshot

Obsolete the screenshot attachment. This bug is about the warning.
Comment 7 Jani Nikula 2016-11-07 16:52:06 UTC
The corruption issue is likely https://bugs.freedesktop.org/show_bug.cgi?id=98402
Comment 8 Martin Ziegler 2016-11-15 14:19:49 UTC
The regression https://bugs.freedesktop.org/show_bug.cgi?id=98287 "gpu hangs after hibernation" which hit me in 4.9-rc1 is still present in 4.9.0-rc5.
Comment 9 The Linux kernel's regression tracker (Thorsten Leemhuis) 2016-11-20 11:27:06 UTC
@the two martins: please let me know if there are any more 4.9 regressions that are still unresolved (and where they are tracked); I lost track with all the different bugs entries
Comment 10 Martin Ziegler 2016-11-20 17:34:42 UTC
I reported the bug 
(gpu hangs after hibernation. Importance: highest blocker) at 2016-10-17. 

The bug was then marked as a duplicate of https://bugs.freedesktop.org/show_bug.cgi?id=97344 (WARN_ON(!msg->buffer != !msg->size). Importance: high normal) 

Since I never saw a fix for the problem, I started to follow 177701 (for reasons I forgot). 

A test of the latest kernel

  commit 77079b133f242d3e3710c9b89ed54458307e54ff
  Author: Linus Torvalds <torvalds@linux-foundation.org>
  Date:   Sat Nov 19 18:40:47 2016 -0800

showed the bug is still there: After boot dmesg shows 49 times the line

  WARNING: CPU: 1 PID: 36 at drivers/gpu/drm/i915/intel_dp.c:1062    intel_dp_aux_transfer+0x1e7/0x220 

After hibernation the gpu hangs: 

  kernel: [drm] GPU HANG: ecode 8:1:0xcefbfece, in Xorg [2250], reason: Hang on blitter ring, action: reset (the gpu recovers though after a few seconds)

Comment 11 Martin Steigerwald 2016-11-20 18:25:31 UTC
> --- Comment #9 from Thorsten Leemhuis <regressions@leemhuis.info> ---
> @the two martins: please let me know if there are any more 4.9 regressions
> that are still unresolved (and where they are tracked); I lost track with
> all the different bugs entries

I am overwhelmed with other stuff and have more important challenges in front 
of me. So I will stay at 4.8 for now.
Comment 12 Martin Ziegler 2016-11-27 23:58:25 UTC
gpu hangs after hibernation still in 4.9-rc7
Comment 13 Martin Steigerwald 2016-12-01 08:07:28 UTC
Martin, I can confirm this. Just had a GPU hang after hibernation on my ThinkPad T520 this morning with 4.9-rc7 + 3 mini merges, compiled yesterday.  I didn´t test whether it hangs with PlaneShift, but I do not want to afford an unstable kernel at the moment as it just takes times I want to spend elsewhere right now. So back at 4.8 kernel once again.
Comment 14 Martin Steigerwald 2016-12-01 08:09:25 UTC
Hmmm, GPU hang in PlaneShift might be related, will report bug at fdo:

merkaba:~#1> zgrep "GPU HANG" /var/log/kern.log*
/var/log/kern.log.3.gz:Nov  8 21:19:08 merkaba kernel: [ 8401.004898] [drm] GPU HANG: ecode 6:0:0x85fffffc, in psclient.bin [8120], reason: Hang on render ring, action: reset
Comment 15 Martin Steigerwald 2016-12-01 08:26:43 UTC
Thorsten, I added the information I have to
Bug 98288 - linux 4.9-r1: gpu hangs after hibernation 

and reported the PlaneShift GPU, I didn´t yet recheck with latest 4.9, hang at:

Bug 98922 - GPU hang on PlaneShift