Created attachment 100571 [details] intel-reg-dump-3.9-edid-checksum.log A new error has appeared with kernel version 3.9, sometimes once, often multiple times: [ 0.509976] [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, remainder is 226 [ 0.510054] Raw EDID: [ 0.510091] 00 ff ff ff ff ff ff 00 15 c3 39 21 01 01 01 01 [ 0.510132] 24 14 01 04 a5 30 1e 78 e2 f5 c5 a8 53 37 ae 25 [ 0.510173] 12 50 54 a1 08 00 a9 40 81 80 81 40 b3 00 01 01 [ 0.510214] 01 01 01 01 01 01 28 3c 80 a0 70 b0 23 40 30 20 [ 0.510255] 36 00 da 29 11 00 00 1a 00 00 00 ff 00 33 39 35 [ 0.510296] 32 37 30 39 30 0a 20 20 20 20 00 00 00 fd 00 3b [ 0.510337] 3d 1f 4c 11 00 0a 20 20 20 20 20 00 00 00 fc 00 [ 0.510378] 53 32 32 34 33 57 0a 20 20 20 20 20 20 01 e4 02 [ 0.572892] [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, remainder is 226 [ 0.572964] [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, remainder is 226
Created attachment 100581 [details] dmesg-3.9-edid-checksum.log
Besides, it doesn't do much harm it seems, everything works as usual. I can also trigger the error when something is accessing EDID, e.g. when starting X, or running 'oyranos-monitor', but not all the time. So, there is some randomness.
I just noticed that the EDID block is different each time.
So far, this only happens on my external Eizo screen connected with DP, not with LVDS, and also not with an external Lenovo display (which has a DP-DVI adaptor which has EDID trouble already since 3.8 - no fbcon - I haven't come around to bisect yet). I will provide a valid EDID block from 3.8.11 as soon as I'm back at the Eizo setup.
Imre, this smells a bit like the gmbus version of the wait_for_event_timeout bug you've tracked down in dp aux transactions. Can you please attach a quick test patch?
Created attachment 100821 [details] fix for gmbus false timeouts
Created attachment 100831 [details] fix for dpaux false timeouts > Imre, this smells a bit like the gmbus version of the wait_for_event_timeout > bug you've tracked down in dp aux transactions. Can you please attach a quick > test patch? I attached the fix for both DP and GMBUS. I'm not sure how this could be GMBUS related though, since afaics on GM45 we don't use IRQ for GMBUS transfers. At least since commit #c12aba5aa0 - "drm/i915: stop using GMBUS IRQs on Gen4 chips". Also with DP we'd get an error message about the timeout and I don't see any trace of it in the attached dmesg. So I'm doubtful this will fix anything but maybe I'm missing something.. Andreas could you try applying both patches on the git://people.freedesktop.org/~danvet/drm-intel drm-intel-nightly branch and see if you can still reproduce the issue? Also a dmesg log would be nice with the drm.debug=0xf kernel parameter.
I've applied both patches on vanilla 3.9.0 and 3.9.1, with the regular dmesg log still containing the EDID errors, but not so far with drm.debug=0xf...
Created attachment 101011 [details] dmesg-3.9.0-r1-crash-and-edid-checksum-130508.log kernel log with patches applied This was generated after an overnight build with the DP monitor switched off, at some point X crashed - see kernel oops - maybe related?
Created attachment 101021 [details] dmesg-3.9.1-edid-drmdebug-130509.log kernel log with patches applied and drm.debug=0xf dmesg output starting up until xdm (going further spams the log to no avail), switching off-on the monitor a few times in fbcon and then switching to X and back again.
Created attachment 101321 [details] dmesg-3.9.2-edid-checksum-drmdebug-130513.log Finally back at that setup and here it is, drm.debug enabled edid checksum failure containing dmesg output.
Hm, I guess we're a bit lost with this one care. Can you please try to bisect which patch exactly introduced this regression?
Just a note that it's still present in a recent drm-intel-nightly image. Before doing a bisect, I will try with a radeon box on the same monitor and DP interface just to see if it happens there as well. Bisecting will - again - involve a tremendous amount of restarts, because the EDID error does not appear all the time. That will probably require some automation...
(In reply to Andreas Sturmlechner from comment #13) > Just a note that it's still present in a recent drm-intel-nightly image. > > Before doing a bisect, I will try with a radeon box on the same monitor and > DP interface just to see if it happens there as well. Bisecting will - again > - involve a tremendous amount of restarts, because the EDID error does not > appear all the time. That will probably require some automation... Ping for any news.
No time for bisecting yet, but at least the certainty that it doesn't happen with the radeon box - that's only able to plug in via hdmi though. Both tried with 3.11.0.
Created attachment 108441 [details] dmesg-3.11.0-drmdebug0xf-130915.log Some news at least. Since I've seen hotplug errors mixed with the EDID checksum errors and a kernel oops as well, I did boot with drm.debug=0xf and fiddled with the DP cable a bit. After un- and re-plugging 3 times (it can be followed by the printk times) it finally triggered another kernel oops, attaching the resulting dmesg. I think that is related, and I couldn't trigger that error in 3.8.13.
Created attachment 108451 [details] intel-reg-dump-3.11.0-drmdebug0xf-130915.log regdump after the kernel oops had happened
The backtraces aren't full oopses but just warnings about inconsistencies in our DP code. Unrelated to the issue at hand here and already tracked in https://bugs.freedesktop.org/show_bug.cgi?id=69251 Also mostly harmless ;-)
Well, OK, then sorry for the noise ;)
I guess we're stuck here a bit. Any time for some bisecting...?
Still no bisecting yet - too busy work schedule to dedicate a whole day to my system that's weak on cpu power as well. 3.12 at least shows the same errors, no change so far, while 3.4 (.68) is fine.
Note that recent kernels (i.e. starting from 3.12) tuned the warning down a bit, you'll only see it once. Just to make sure you don't get thrown off by that when bisecting ;-)
(In reply to Daniel Vetter from comment #22) > Note that recent kernels (i.e. starting from 3.12) tuned the warning down a > bit, you'll only see it once. Just to make sure you don't get thrown off by > that when bisecting ;-) In current intel-drm-nightly (3.13.0-rc4+) I couldn't grep an EDID dmesg at all so far, but there are problems on the DisplayPort setup nevertheless, as described in https://bugs.freedesktop.org/show_bug.cgi?id=53385#c46 (perhaps that post would have better fitted in here). But I'm not sure if related or intertwined with your bug from above or whether to open yet another bug.
Timeout. Please reopen if the problem persists with recent kernels.
Created attachment 146701 [details] 20140812-1740_3.16.0_dmesg-ON.log *scans logs* - yes, happened 3 days ago with 3.16.0, accompanied by a pipe A underrun. It will be interesting to see if there is any trouble with my soon-to-be-carried-over Haswell box on the same display.
Adjusted priority to reflect reality. Can you provide dmesg with drm.debug=14 module parameter set, reproducing the issue?
Long time no updates, closing. If the problem persists with latest kernels, please file a bug at the freedesktop.org bugzilla [1], referencing this bug. Thank you. [1] https://bugs.freedesktop.org/enter_bug.cgi?product=DRI&component=DRM/Intel