Bug 57461 - [GM45] early in dmesg: *ERROR* EDID checksum is invalid
Summary: [GM45] early in dmesg: *ERROR* EDID checksum is invalid
Status: RESOLVED OBSOLETE
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - Intel) (show other bugs)
Hardware: All Linux
: P4 normal
Assignee: intel-gfx-bugs@lists.freedesktop.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-05-02 20:17 UTC by Andreas Sturmlechner
Modified: 2015-10-07 10:56 UTC (History)
2 users (show)

See Also:
Kernel Version: 3.9
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
intel-reg-dump-3.9-edid-checksum.log (13.52 KB, text/plain)
2013-05-02 20:17 UTC, Andreas Sturmlechner
Details
dmesg-3.9-edid-checksum.log (82.89 KB, text/plain)
2013-05-02 20:19 UTC, Andreas Sturmlechner
Details
fix for gmbus false timeouts (985 bytes, patch)
2013-05-06 12:49 UTC, Imre Deak
Details | Diff
fix for dpaux false timeouts (2.10 KB, patch)
2013-05-06 13:06 UTC, Imre Deak
Details | Diff
dmesg-3.9.0-r1-crash-and-edid-checksum-130508.log (61.79 KB, text/plain)
2013-05-09 09:08 UTC, Andreas Sturmlechner
Details
dmesg-3.9.1-edid-drmdebug-130509.log (235.47 KB, text/plain)
2013-05-09 09:15 UTC, Andreas Sturmlechner
Details
dmesg-3.9.2-edid-checksum-drmdebug-130513.log (234.58 KB, text/plain)
2013-05-13 18:33 UTC, Andreas Sturmlechner
Details
dmesg-3.11.0-drmdebug0xf-130915.log (162.87 KB, text/plain)
2013-09-15 21:13 UTC, Andreas Sturmlechner
Details
intel-reg-dump-3.11.0-drmdebug0xf-130915.log (13.56 KB, text/plain)
2013-09-15 21:15 UTC, Andreas Sturmlechner
Details
20140812-1740_3.16.0_dmesg-ON.log (60.81 KB, text/plain)
2014-08-15 12:01 UTC, Andreas Sturmlechner
Details

Description Andreas Sturmlechner 2013-05-02 20:17:56 UTC
Created attachment 100571 [details]
intel-reg-dump-3.9-edid-checksum.log

A new error has appeared with kernel version 3.9, sometimes once, often multiple times:


[    0.509976] [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, remainder is 226
[    0.510054] Raw EDID:
[    0.510091]          00 ff ff ff ff ff ff 00 15 c3 39 21 01 01 01 01
[    0.510132]          24 14 01 04 a5 30 1e 78 e2 f5 c5 a8 53 37 ae 25
[    0.510173]          12 50 54 a1 08 00 a9 40 81 80 81 40 b3 00 01 01
[    0.510214]          01 01 01 01 01 01 28 3c 80 a0 70 b0 23 40 30 20
[    0.510255]          36 00 da 29 11 00 00 1a 00 00 00 ff 00 33 39 35
[    0.510296]          32 37 30 39 30 0a 20 20 20 20 00 00 00 fd 00 3b
[    0.510337]          3d 1f 4c 11 00 0a 20 20 20 20 20 00 00 00 fc 00
[    0.510378]          53 32 32 34 33 57 0a 20 20 20 20 20 20 01 e4 02
[    0.572892] [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, remainder is 226
[    0.572964] [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, remainder is 226
Comment 1 Andreas Sturmlechner 2013-05-02 20:19:45 UTC
Created attachment 100581 [details]
dmesg-3.9-edid-checksum.log
Comment 2 Andreas Sturmlechner 2013-05-02 20:25:55 UTC
Besides, it doesn't do much harm it seems, everything works as usual.

I can also trigger the error when something is accessing EDID, e.g. when starting X, or running 'oyranos-monitor', but not all the time. So, there is some randomness.
Comment 3 Andreas Sturmlechner 2013-05-02 20:28:41 UTC
I just noticed that the EDID block is different each time.
Comment 4 Andreas Sturmlechner 2013-05-04 16:33:01 UTC
So far, this only happens on my external Eizo screen connected with DP, not with LVDS, and also not with an external Lenovo display (which has a DP-DVI adaptor which has EDID trouble already since 3.8 - no fbcon - I haven't come around to bisect yet).

I will provide a valid EDID block from 3.8.11 as soon as I'm back at the Eizo setup.
Comment 5 Daniel Vetter 2013-05-06 08:22:24 UTC
Imre, this smells a bit like the gmbus version of the wait_for_event_timeout bug you've tracked down in dp aux transactions. Can you please attach a quick test patch?
Comment 6 Imre Deak 2013-05-06 12:49:05 UTC
Created attachment 100821 [details]
fix for gmbus false timeouts
Comment 7 Imre Deak 2013-05-06 13:06:59 UTC
Created attachment 100831 [details]
fix for dpaux false timeouts

> Imre, this smells a bit like the gmbus version of the wait_for_event_timeout
> bug you've tracked down in dp aux transactions. Can you please attach a quick
> test patch?

I attached the fix for both DP and GMBUS.

I'm not sure how this could be GMBUS related though, since afaics on GM45 we don't use IRQ for GMBUS transfers. At least since commit #c12aba5aa0 - "drm/i915: stop using GMBUS IRQs on Gen4 chips".

Also with DP we'd get an error message about the timeout and I don't see any trace of it in the attached dmesg. So I'm doubtful this will fix anything but maybe I'm missing something..

Andreas could you try applying both patches on the git://people.freedesktop.org/~danvet/drm-intel drm-intel-nightly branch and see if you can still reproduce the issue? Also a dmesg log would be nice with the drm.debug=0xf kernel parameter.
Comment 8 Andreas Sturmlechner 2013-05-09 09:04:32 UTC
I've applied both patches on vanilla 3.9.0 and 3.9.1, with the regular dmesg log still containing the EDID errors, but not so far with drm.debug=0xf...
Comment 9 Andreas Sturmlechner 2013-05-09 09:08:42 UTC
Created attachment 101011 [details]
dmesg-3.9.0-r1-crash-and-edid-checksum-130508.log

kernel log with patches applied

This was generated after an overnight build with the DP monitor switched off, at some point X crashed - see kernel oops - maybe related?
Comment 10 Andreas Sturmlechner 2013-05-09 09:15:20 UTC
Created attachment 101021 [details]
dmesg-3.9.1-edid-drmdebug-130509.log

kernel log with patches applied and drm.debug=0xf

dmesg output starting up until xdm (going further spams the log to no avail), switching off-on the monitor a few times in fbcon and then switching to X and back again.
Comment 11 Andreas Sturmlechner 2013-05-13 18:33:31 UTC
Created attachment 101321 [details]
dmesg-3.9.2-edid-checksum-drmdebug-130513.log

Finally back at that setup and here it is, drm.debug enabled edid checksum failure containing dmesg output.
Comment 12 Daniel Vetter 2013-05-20 20:38:56 UTC
Hm, I guess we're a bit lost with this one care. Can you please try to bisect which patch exactly introduced this regression?
Comment 13 Andreas Sturmlechner 2013-07-20 16:37:46 UTC
Just a note that it's still present in a recent drm-intel-nightly image.

Before doing a bisect, I will try with a radeon box on the same monitor and DP interface just to see if it happens there as well. Bisecting will - again - involve a tremendous amount of restarts, because the EDID error does not appear all the time. That will probably require some automation...
Comment 14 Jani Nikula 2013-09-10 08:39:48 UTC
(In reply to Andreas Sturmlechner from comment #13)
> Just a note that it's still present in a recent drm-intel-nightly image.
> 
> Before doing a bisect, I will try with a radeon box on the same monitor and
> DP interface just to see if it happens there as well. Bisecting will - again
> - involve a tremendous amount of restarts, because the EDID error does not
> appear all the time. That will probably require some automation...

Ping for any news.
Comment 15 Andreas Sturmlechner 2013-09-13 21:25:04 UTC
No time for bisecting yet, but at least the certainty that it doesn't happen with the radeon box - that's only able to plug in via hdmi though. Both tried with 3.11.0.
Comment 16 Andreas Sturmlechner 2013-09-15 21:13:57 UTC
Created attachment 108441 [details]
dmesg-3.11.0-drmdebug0xf-130915.log

Some news at least. Since I've seen hotplug errors mixed with the EDID checksum errors and a kernel oops as well, I did boot with drm.debug=0xf and fiddled with the DP cable a bit. After un- and re-plugging 3 times (it can be followed by the printk times) it finally triggered another kernel oops, attaching the resulting dmesg. I think that is related, and I couldn't trigger that error in 3.8.13.
Comment 17 Andreas Sturmlechner 2013-09-15 21:15:54 UTC
Created attachment 108451 [details]
intel-reg-dump-3.11.0-drmdebug0xf-130915.log

regdump after the kernel oops had happened
Comment 18 Daniel Vetter 2013-09-16 06:10:15 UTC
The backtraces aren't full oopses but just warnings about inconsistencies in our DP code. Unrelated to the issue at hand here and already tracked in https://bugs.freedesktop.org/show_bug.cgi?id=69251 Also mostly harmless ;-)
Comment 19 Andreas Sturmlechner 2013-09-16 11:35:04 UTC
Well, OK, then sorry for the noise ;)
Comment 20 Jani Nikula 2013-10-09 12:17:24 UTC
I guess we're stuck here a bit. Any time for some bisecting...?
Comment 21 Andreas Sturmlechner 2013-11-10 13:21:37 UTC
Still no bisecting yet - too busy work schedule to dedicate a whole day to my system that's weak on cpu power as well.

3.12 at least shows the same errors, no change so far, while 3.4 (.68) is fine.
Comment 22 Daniel Vetter 2013-11-10 13:30:57 UTC
Note that recent kernels (i.e. starting from 3.12) tuned the warning down a bit, you'll only see it once. Just to make sure you don't get thrown off by that when bisecting ;-)
Comment 23 Andreas Sturmlechner 2013-12-27 17:19:22 UTC
(In reply to Daniel Vetter from comment #22)
> Note that recent kernels (i.e. starting from 3.12) tuned the warning down a
> bit, you'll only see it once. Just to make sure you don't get thrown off by
> that when bisecting ;-)
In current intel-drm-nightly (3.13.0-rc4+) I couldn't grep an EDID dmesg at all so far, but there are problems on the DisplayPort setup nevertheless, as described in https://bugs.freedesktop.org/show_bug.cgi?id=53385#c46 (perhaps that post would have better fitted in here). But I'm not sure if related or intertwined with your bug from above or whether to open yet another bug.
Comment 24 Jani Nikula 2014-08-14 08:19:31 UTC
Timeout. Please reopen if the problem persists with recent kernels.
Comment 25 Andreas Sturmlechner 2014-08-15 12:01:50 UTC
Created attachment 146701 [details]
20140812-1740_3.16.0_dmesg-ON.log

*scans logs* - yes, happened 3 days ago with 3.16.0, accompanied by a pipe A underrun.

It will be interesting to see if there is any trouble with my soon-to-be-carried-over Haswell box on the same display.
Comment 26 Jani Nikula 2015-06-16 09:25:54 UTC
Adjusted priority to reflect reality. Can you provide dmesg with drm.debug=14 module parameter set, reproducing the issue?
Comment 27 Jani Nikula 2015-10-07 10:56:52 UTC
Long time no updates, closing.

If the problem persists with latest kernels, please file a bug at the freedesktop.org bugzilla [1], referencing this bug. Thank you.

[1] https://bugs.freedesktop.org/enter_bug.cgi?product=DRI&component=DRM/Intel

Note You need to log in before you can comment on or make changes to this bug.