Bug 28812

Summary: DVI attached monitor is turned off while booting linux 2.6.37 and higher
Product: Drivers Reporter: Markus Heinz (markus.heinz)
Component: Video(DRI - Intel)Assignee: drivers_video-dri-intel (drivers_video-dri-intel)
Status: CLOSED CODE_FIX    
Severity: blocking CC: angelo70, chris, daniel, florian, maciej.rutecki, rjw
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.3 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 21782    
Attachments: kernel log from a linux 2.6.38-rc4 boot
Hardware configuration
kernel log from booting linux 2.6.38-rc7
linux kernel from first bad revision
dmesg from a patched linux 2.6.38.1 kernel
dmesg from unpatched linux 2.6.38.1 kernel
i915_opregion file from patched 2.6.38.1 kernel
Output of dmidecode
dmi quirk-away broken vbt

Description Markus Heinz 2011-02-10 19:33:37 UTC
When booting a linux kernel 2.6.37 from kernel.org my monitor which is an Asus VW222U which is attached via DVI port is turned off just as the kernel begins booting. It stays turned off afterwards. 

There are messages like these in the kernel log:

Feb 10 20:03:33 Darksun kernel: [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, remainder is 96
Feb 10 20:03:33 Darksun kernel: [drm:drm_edid_block_valid] *ERROR* Raw EDID:
Feb 10 20:03:33 Darksun kernel: <3>24 41 44 32 01 01 08 23 11 c1 66 00 00 00 00 00  $AD2...#..f.....
Feb 10 20:03:33 Darksun kernel: <3>43 58 05 29 00 00 53 60 00 00 00 00 00 00 70 03  CX.)..S`......p.
Feb 10 20:03:33 Darksun kernel: <3>ff 10 30 00 c8 00 01 00 00 00 00 00 00 00 00 00  ..0.............
Feb 10 20:03:33 Darksun kernel: <3>00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

The problem persists until linux kernel 2.6.38-rc4. With linux kernel 2.6.36 everything works as expected. The above messages are not in the log and the monitor is not turned off.
Comment 1 Markus Heinz 2011-02-10 19:36:17 UTC
Created attachment 47182 [details]
kernel log from a linux 2.6.38-rc4 boot

I have attached the boot log of a linux 2.6.38-rc4 kernel.
Comment 2 Markus Heinz 2011-02-10 19:38:56 UTC
Created attachment 47192 [details]
Hardware configuration

I have attached the output of "lspci -vvv" on my system.
Comment 3 Markus Heinz 2011-03-05 17:20:43 UTC
Created attachment 50102 [details]
kernel log from booting linux 2.6.38-rc7

Update: When booting linux 2.6.38-rc7 the monitor is no longer turned off. But the framebuffer text consoles and the X server do not come up in the right resolution. It seems they come up in 1024x768 instead of 1680x1050. The xdm login box is partly off screen. And the framebuffer text consoles are filled with endless messages about the EDID checksum error.
Comment 4 Markus Heinz 2011-03-20 16:19:40 UTC
The release kernel linux-2.6.38 behaves no different than the version 2.6.38-rc7. See previous comment.
Comment 5 Chris Wilson 2011-03-21 08:38:42 UTC
Can you tell us which commit between 2.6.36-rc4 and 2.6.37 broke the EDID retrieval? I can guess it is SDVO port related... drm.debug=0xe would help, but not as much as that and the bisection.
Comment 6 Markus Heinz 2011-03-26 17:35:04 UTC
I have bisected the kernel between v2.6.36 and v2.6.37. The result is:

44834a67c0082e2cf74b16be91e49108b1432d65 is the first bad commit
commit 44834a67c0082e2cf74b16be91e49108b1432d65
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Aug 19 16:09:23 2010 +0100

    drm/i915: Use the VBT from OpRegion when available (v3)
    
    It is recommended that we use the Video BIOS tables that were copied
    into the OpRegion during POST when initialising the driver. This saves
    us from having to furtle around inside the ROM ourselves and possibly
    allows the vBIOS to adjust the tables prior to initialisation.
    
    On some systems, such as the Samsung N210, there is no accessible VBIOS
    and the only means of finding the VBT is through the OpRegion.
    
    v2: Rearrange the code so that ASLE is enabled along with ACPI
    v3: Enable OpRegion parsing even without ACPI
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Matthew Garrett <mjg@redhat.com>

:040000 040000 9e9d888e698bcee28fbbf040f0b0a68450fe2d57 669037950f60b494886259b930dedcd92c318f03 M      drivers
Comment 7 Markus Heinz 2011-03-26 17:44:54 UTC
Created attachment 52082 [details]
linux kernel from first bad revision

I have attached the kernel log from booting the kernel with the first bad revision from bisecting between v2.6.36 and v2.6.37 with extended debug output.
Comment 8 Florian Mickler 2011-03-27 07:14:06 UTC
First-Bad-Commit: 44834a67c0082e2cf74b16be91e49108b1432d65
Comment 9 Chris Wilson 2011-03-27 07:28:11 UTC
That would only make sense if the VBT differed between the ROM and the copy in OpRegion memory, for example if you had a machine which had no ROM. A debug dmesg for 44834a67^ would confirm that.

If there is a ROM, then we need to dump both to see what differed and suggest a course of action. intel_bios_read (part of intel-gpu-tools) and /sys/kernel/debug/dri/0/i915_opregion. Otherwise, it is just our use of some information from the VBT that conflicts with reality. More information on that front is found in later kernels debug dmesg.

To verify the bisection, you can simply do:

diff --git a/drivers/gpu/drm/i915/intel_bios.c b/drivers/gpu/drm/i915/intel_bios.c
index fb5b4d4..fac32cf 100644
--- a/drivers/gpu/drm/i915/intel_bios.c
+++ b/drivers/gpu/drm/i915/intel_bios.c
@@ -612,7 +612,7 @@ intel_parse_bios(struct drm_device *dev)
        init_vbt_defaults(dev_priv);
 
        /* XXX Should this validation be moved to intel_opregion.c? */
-       if (dev_priv->opregion.vbt) {
+       if (0 && dev_priv->opregion.vbt) {
                struct vbt_header *vbt = dev_priv->opregion.vbt;
                if (memcmp(vbt->signature, "$VBT", 4) == 0) {
                        DRM_DEBUG_DRIVER("Using VBT from OpRegion: %20s\n",
Comment 10 Markus Heinz 2011-03-27 10:35:19 UTC
I have patched a linux 2.6.38.1 kernel with the patch from comment #9 and have successfully booted it. The monitor is not turned off and the resulting screen resolution is fine, too.
Comment 11 Markus Heinz 2011-03-27 10:38:47 UTC
Created attachment 52142 [details]
dmesg from a patched linux 2.6.38.1 kernel

I have attached the dmesg output from the patched linux 2.6.38.1 kernel from the previous comment.
Comment 12 Chris Wilson 2011-03-27 10:55:31 UTC
Thanks, we get an interesting message:

[drm:parse_sdvo_device_mapping], the SDVO device with slave addr 70 is found on SDVOB port
[drm:parse_sdvo_device_mapping], SDVO device: dvo=1, addr=70, wiring=1, ddc_pin=29, i2c_pin=5, i2c_speed=67
[drm:parse_sdvo_device_mapping], the SDVO device with slave addr 70 is found on SDVOB port
[drm:parse_sdvo_device_mapping], Maybe one SDVO port is shared by two SDVO device.

Can you please also attach the debug dmesg for the unpatched 2.6.38?
Comment 13 Markus Heinz 2011-03-27 12:54:24 UTC
Created attachment 52152 [details]
dmesg from unpatched linux 2.6.38.1 kernel

I have attached the dmesg output from an unpatched linux 2.6.38.1 kernel.
Comment 14 Chris Wilson 2011-03-27 13:02:23 UTC
And so to compare the unpatched kernel:

[drm:intel_parse_bios], Using VBT from OpRegion: $VBT BEARLAKE-B     d
[drm:parse_general_definitions], crt_ddc_bus_pin: 2
[drm:parse_sdvo_device_mapping], No SDVO device info is found in VBT
[drm:parse_device_mapping], no child dev is parsed from VBT

Yay for out-of-spec BIOSes.
Comment 15 Chris Wilson 2011-03-27 13:04:00 UTC
Can you please attach /sys/kernel/debug/dri/0/i915_opregion?
Comment 16 Markus Heinz 2011-03-27 14:08:24 UTC
I do not have a file named "/sys/kernel/debug/dri/0/i915_opregion". How can I create it?
Comment 17 Florian Mickler 2011-03-29 21:20:14 UTC
Do you have debugfs mounted at /sys/kernel/debug ? (mount -t debugfs debugfs /sys/kernel/debug will do the trick, if you have it activated in your kernel .config)
Comment 18 Markus Heinz 2011-03-30 16:57:04 UTC
Created attachment 52662 [details]
i915_opregion file from patched 2.6.38.1 kernel

I have attached the file i915_opregion from debug fs running on a patched linux 2.6.38.1 kernel.
Comment 19 Markus Heinz 2011-08-06 14:47:55 UTC
Linux 3.0.1 is still affected by this regression.
Comment 20 Angelo 2012-02-01 22:01:52 UTC
I am using kernel 3.1.6, opensource radeon driver(gallium) on a Radeon Turks [Radeon HD 6670].

I had a very strange issue, that can be connected with this drm issue, i leave the story:

1) both drm and Xorg was giving bad EDID checksum.
After a deep investigation, using also other tools like get-edid, i found 1 byte of the fixed EDID header damaged, from 0xFF to 0x80. Setting it to 0xFF was giving the correct checksum. So i opened the monitor, de-soldered the i2c eeprom. and once dumped the wrong byte was really there. I reprogrammed that byte only and solved, no more EDID checksum error.
But my question is, since the monitor have about 3 years, who can be responsible to have damaged this byte ? Radeon driver ? Some kernel module could also be able to write a wrong byte in the monitor i2c eeprom ?
2) after many days, drm(kernel) started again to give some checksum error. It shows a wrong byte in the EDID block (since i have the original dump). This message disappear if i power off for some seconds and back on the monitor.

My mb seems not having i915:
lspci shows:
PCI bridge: Intel Corporation 82801I (ICH9 Family)

Is it possible that on newer kernel this EDID bad-checksum issue has been fixed ?

many thanks
regards
Comment 21 Markus Heinz 2012-03-24 17:35:43 UTC
Linux version 3.1, 3.2 and 3.3 are affected, too.
Comment 22 Daniel Vetter 2012-03-24 17:46:26 UTC
Ok, sounds like we just need to quirk away the opregion vbt for this machine. Can you please attach the output of dmidecode so I can prep the corresponding patch?
Comment 23 Markus Heinz 2012-03-24 18:35:44 UTC
Created attachment 72699 [details]
Output of dmidecode

I have attached the output of dmidecode.
Comment 24 Daniel Vetter 2012-03-24 19:16:50 UTC
Created attachment 72700 [details]
dmi quirk-away broken vbt

Please test this patch and if it works, supply your tested-by.
Comment 25 Markus Heinz 2012-03-24 20:09:58 UTC
I have successfully tested the patch from comment #23 against a vanilla 3.3 kernel. The screen resolution in the console and under X11 is as expected. 

I have found this log message:

[drm:intel_no_opregion_vbt_callback], Falling back to manually reading VBT fromVBIOS ROM for ThinkCentre A57

Thanks a lot. Will this patch be included in the next kernel release (3.3.1)?
Comment 26 Daniel Vetter 2012-03-24 22:50:40 UTC
I've submitted it for inclusion into 3.4-rc, as soon as it lands there, it will get applied to older kernels by the stable team.
Comment 27 Florian Mickler 2012-04-16 21:18:41 UTC
A patch referencing this bug report has been merged in Linux v3.4-rc2:

commit 25e341cfc33d94435472983825163e97fe370a6c
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Sat Mar 24 23:51:30 2012 +0100

    drm/i915: quirk away broken OpRegion VBT