Bug 46381

Summary: [i915GM] Null pointer dereference in the sdvo i2c stuff
Product: Drivers Reporter: Bjoern Franke (bjo)
Component: Video(DRI - Intel)Assignee: drivers_video-dri-intel (drivers_video-dri-intel)
Status: RESOLVED PATCH_ALREADY_AVAILABLE    
Severity: normal CC: bgamari, daniel, florian, thomas
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.5.2 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: dmesg up to / including the i915 oops
new execption
dmesg with drm.debug / 3.7rc4
testpatch to make intel_sdvo bigger
dmesg with drm.debug and applied testpatch
drm/i915/sdvo: clean up connectors on intel_sdvo_init() failures

Description Bjoern Franke 2012-08-23 16:45:56 UTC
On my Dell Latitude D410, I get the following error and a black screen. Seems to be the same error which was fixed in #35242, 


[   48.386892] [drm] initialized overlay support
[   48.482110] BUG: unable to handle kernel NULL pointer dereference at 00000008
[   48.482125] IP: [<f8f023e0>] i2c_transfer+0x10/0xd0 [i2c_core]
[   48.482141] *pde = 00000000 
[   48.482147] Oops: 0000 [#1] PREEMPT SMP 
[   48.482155] Modules linked in: b43 bcma snd_intel8x0(+) snd_intel8x0m(+) mac80211 snd_ac97_codec ac97_bus i915(+) snd_pcm cfg80211 iTCO_wdt ssb iTCO_vendor_support joydev pcmcia snd_page_alloc btusb snd_timer bluetooth intel_agp intel_gtt mmc_core i2c_algo_bit i2c_i801 drm_kms_helper gpio_ich lpc_ich snd yenta_socket pcmcia_rsrc soundcore drm tg3 rfkill agpgart i2c_core serio_raw pcspkr pcmcia_core libphy irda acpi_cpufreq psmouse mperf dell_laptop evdev dcdbas crc_ccitt processor thermal ac video button battery microcode nfs nfs_acl lockd auth_rpcgss sunrpc fscache i8k capi kernelcapi autofs4 ext4 crc16 jbd2 mbcache aes_i586 cryptd aes_generic xts gf128mul hid_generic usbhid hid dm_crypt dm_mod sd_mod pata_acpi ata_generic ata_piix libata scsi_mod uhci_hcd ehci_hcd usbcore usb_common
[   48.482281] 
[   48.482287] Pid: 178, comm: systemd-udevd Not tainted 3.5.2-1-ARCH #1 Dell Inc. Latitude D410                   /0H8384
[   48.482301] EIP: 0060:[<f8f023e0>] EFLAGS: 00010286 CPU: 0
[   48.482310] EIP is at i2c_transfer+0x10/0xd0 [i2c_core]
[   48.482317] EAX: 00000000 EBX: 00000000 ECX: 00000003 EDX: f4ecc200
[   48.482324] ESI: f4ecc218 EDI: f4db3800 EBP: f5333ba4 ESP: f5333b88
[   48.482331]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[   48.482337] CR0: 8005003b CR2: 00000008 CR3: 35003000 CR4: 000007d0
[   48.482345] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[   48.482351] DR6: ffff0ff0 DR7: 00000400
[   48.482358] Process systemd-udevd (pid: 178, ti=f5332000 task=f5ba2530 task.ti=f5332000)
[   48.482366] Stack:
[   48.482370]  f8a6e390 00000024 000080d0 f4ecc200 00000001 f4ecc218 f4db3800 f5333bf0
[   48.482385]  f8a6e5c0 00000004 00000000 00000000 f8a9631b f8a92014 0000000b f5333c08
[   48.482398]  0000000b 0b000001 00000003 f4ecc200 f4e2c040 0000000c 09000001 f4db3800
[   48.482412] Call Trace:
[   48.482454]  [<f8a6e390>] ? intel_sdvo_write_cmd+0x60/0x380 [i915]
[   48.482485]  [<f8a6e5c0>] intel_sdvo_write_cmd+0x290/0x380 [i915]
[   48.482517]  [<f8a6f60b>] intel_sdvo_detect+0x2b/0x2d0 [i915]
[   48.482529]  [<c11f03ca>] ? snprintf+0x1a/0x20
[   48.482544]  [<f905fbc5>] ? drm_get_connector_name+0x45/0x50 [drm]
[   48.482555]  [<f9135078>] drm_helper_probe_single_connector_modes+0x198/0x320 [drm_kms_helper]
[   48.482569]  [<f9132e01>] drm_fb_helper_probe_connector_modes.isra.2+0x41/0x60 [drm_kms_helper]
[   48.482582]  [<f9133ebc>] drm_fb_helper_initial_config+0x16c/0x1f0 [drm_kms_helper]
[   48.482594]  [<c1124f0d>] ? __kmalloc+0x12d/0x160
[   48.482602]  [<c1124902>] ? kmem_cache_alloc_trace+0x112/0x120
[   48.482611]  [<f9132e57>] ? drm_fb_helper_single_add_all_connectors+0x37/0xc0 [drm_kms_helper]
[   48.482648]  [<f8a79836>] intel_fbdev_init+0x76/0xb0 [i915]
[   48.482674]  [<f8a41610>] i915_driver_load+0xa30/0xb10 [i915]
[   48.482699]  [<f8a3f270>] ? i915_switcheroo_set_state+0xa0/0xa0 [i915]
[   48.482716]  [<f905c85b>] drm_get_pci_dev+0x13b/0x260 [drm]
[   48.482747]  [<f8a8477f>] i915_pci_probe+0x13/0x1d [i915]
[   48.482756]  [<c120c727>] pci_device_probe+0x87/0x110
[   48.482765]  [<c1191ae7>] ? sysfs_create_link+0x17/0x20
[   48.482775]  [<c1299fd3>] driver_probe_device+0x63/0x1e0
[   48.482783]  [<c129a1e1>] __driver_attach+0x91/0xa0
[   48.482791]  [<c129a150>] ? driver_probe_device+0x1e0/0x1e0
[   48.482799]  [<c1298782>] bus_for_each_dev+0x42/0x80
[   48.482807]  [<c1299bae>] driver_attach+0x1e/0x20
[   48.482815]  [<c129a150>] ? driver_probe_device+0x1e0/0x1e0
[   48.482823]  [<c1299807>] bus_add_driver+0x187/0x270
[   48.482830]  [<c120c5a0>] ? pci_dev_put+0x20/0x20
[   48.482838]  [<c129a7ca>] driver_register+0x6a/0x140
[   48.482847]  [<c10b7d56>] ? get_tracepoint+0x16/0x1b0
[   48.482856]  [<f86e3000>] ? 0xf86e2fff
[   48.482863]  [<c120c9b2>] __pci_register_driver+0x42/0xb0
[   48.482872]  [<f86e3000>] ? 0xf86e2fff
[   48.482885]  [<f905ca7d>] drm_pci_init+0xfd/0x110 [drm]
[   48.482894]  [<f86e3000>] ? 0xf86e2fff
[   48.482918]  [<f86e3085>] i915_init+0x85/0x87 [i915]
[   48.482926]  [<c10011f2>] do_one_initcall+0x112/0x160
[   48.482935]  [<c105c17a>] ? __blocking_notifier_call_chain+0x4a/0x80
[   48.482945]  [<c1093585>] sys_init_module+0xe35/0x1ae0
[   48.482957]  [<c13cc05f>] sysenter_do_call+0x12/0x28
[   48.482963] Code: 00 00 00 8d 42 d8 e8 d0 ff ff ff 5d c3 8d b4 26 00 00 00 00 8d bc 27 00 00 00 00 55 89 e5 57 56 53 83 ec 10 3e 8d 74 26 00 89 c3 <8b> 40 08 89 55 f0 89 4d ec 8b 10 85 d2 0f 84 99 00 00 00 89 e0 
[   48.483029] EIP: [<f8f023e0>] i2c_transfer+0x10/0xd0 [i2c_core] SS:ESP 0068:f5333b88
[   48.483042] CR2: 0000000000000008
[   48.483049] ---[ end trace 8999b73e40e49748 ]---
Comment 1 Daniel Vetter 2012-08-24 00:09:17 UTC
Believed to be fixed with:

commit cee25168e9c4ef7f9417632af2dc78b8521dfda7
Author: Jani Nikula <jani.nikula@intel.com>
Date:   Mon Aug 13 17:33:02 2012 +0300

    drm/i915: ensure i2c adapter is all set before adding it

Which is part of 3.6-rc3. If this is not the case, please reopen the bug report, thanks.
Comment 2 Bjoern Franke 2012-08-28 00:19:01 UTC
No, with 3.6rc3 it still crashes sometimes.  But from time to time, it boots and than it runs into another issue which has to do with wq_worker. 

[   23.937533] BUG: unable to handle kernel NULL pointer dereference at 00000008
[   23.937549] IP: [<f802e456>] i2c_transfer+0x16/0x90 [i2c_core]
[   23.937565] *pde = 00000000 
[   23.937571] Oops: 0000 [#1] PREEMPT SMP 
[   23.937579] Modules linked in: b43 snd_intel8x0(+) bcma snd_intel8x0m(+) snd_ac97_codec mac80211 i915(+) ac97_bus cfg80211 rfkill snd_pcm ssb tg3 i2c_algo_bit drm_kms_helper snd_page_alloc drm i2c_i801 snd_timer mmc_core snd joydev pcmcia soundcore iTCO_wdt yenta_socket irda dell_laptop libphy i2c_core gpio_ich iTCO_vendor_support intel_agp intel_gtt acpi_cpufreq lpc_ich mperf serio_raw psmouse pcmcia_rsrc processor pcmcia_core agpgart microcode crc_ccitt pcspkr thermal dcdbas video button evdev battery ac nfs lockd sunrpc fscache i8k capi kernelcapi autofs4 ext4 crc16 jbd2 mbcache aes_i586 ablk_helper cryptd aes_generic xts gf128mul dm_crypt dm_mod sd_mod pata_acpi ata_generic ata_piix libata scsi_mod uhci_hcd ehci_hcd usbcore usb_common
[   23.937702] Pid: 184, comm: systemd-udevd Not tainted 3.6.0-rc3-mainline #1 Dell Inc. Latitude D410                   /0H8384
[   23.937714] EIP: 0060:[<f802e456>] EFLAGS: 00010292 CPU: 0
[   23.937723] EIP is at i2c_transfer+0x16/0x90 [i2c_core]
[   23.937730] EAX: 00000000 EBX: 00000000 ECX: 00000003 EDX: f54cd580
[   23.937737] ESI: f54cd598 EDI: f4de6800 EBP: f5673ba4 ESP: f5673b98
[   23.937744]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[   23.937750] CR0: 8005003b CR2: 00000008 CR3: 3558c000 CR4: 000007d0
[   23.937758] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[   23.937764] DR6: ffff0ff0 DR7: 00000400
[   23.937770] Process systemd-udevd (pid: 184, ti=f5672000 task=f545b910 task.ti=f5672000)
[   23.937779] Stack:
[   23.937783]  00000001 f54cd598 f4de6800 f5673bf0 f89f6120 00000004 00000000 00000000
[   23.937798]  f8a1f5c7 f8a1b0d8 0000000b f5673c08 0000000b 0b000001 00000003 f54cd580
[   23.937812]  f5577780 0000000c 09670001 f4de6800 f5f99000 f4e7561c f5673c30 f89f7131
[   23.937827] Call Trace:
[   23.937871]  [<f89f6120>] intel_sdvo_write_cmd+0x2a0/0x3a0 [i915]
[   23.937904]  [<f89f7131>] intel_sdvo_detect+0x31/0x2e0 [i915]
[   23.937916]  [<c11f002a>] ? snprintf+0x1a/0x20
[   23.937930]  [<f854baf5>] ? drm_get_connector_name+0x45/0x50 [drm]
[   23.937942]  [<f85a1018>] drm_helper_probe_single_connector_modes+0x198/0x320 [drm_kms_helper]
[   23.937955]  [<f859ede1>] drm_fb_helper_probe_connector_modes.isra.2+0x41/0x60 [drm_kms_helper]
[   23.937967]  [<f859fe5c>] drm_fb_helper_initial_config+0x16c/0x1f0 [drm_kms_helper]
[   23.937979]  [<c1128e62>] ? kmem_cache_alloc_trace+0x112/0x120
[   23.937987]  [<c112901d>] ? __kmalloc+0x12d/0x160
[   23.937994]  [<c1128e62>] ? kmem_cache_alloc_trace+0x112/0x120
[   23.938004]  [<f859ee37>] ? drm_fb_helper_single_add_all_connectors+0x37/0xc0 [drm_kms_helper]
[   23.938040]  [<f8a01c96>] intel_fbdev_init+0x76/0xb0 [i915]
[   23.938066]  [<f89c64ff>] i915_driver_load+0x9bf/0xb00 [i915]
[   23.938091]  [<f89c41d0>] ? i915_switcheroo_set_state+0xa0/0xa0 [i915]
[   23.938107]  [<f854863b>] drm_get_pci_dev+0x13b/0x260 [drm]
[   23.938138]  [<f8a0ccd7>] i915_pci_probe+0x4b/0x55 [i915]
[   23.938148]  [<c120cb87>] pci_device_probe+0x87/0x110
[   23.938158]  [<c1190e97>] ? sysfs_create_link+0x17/0x20
[   23.938169]  [<c129afdc>] driver_probe_device+0x5c/0x1e0
[   23.938177]  [<c129b1f1>] __driver_attach+0x91/0xa0
[   23.938185]  [<c129b160>] ? driver_probe_device+0x1e0/0x1e0
[   23.938194]  [<c12997a2>] bus_for_each_dev+0x42/0x80
[   23.938202]  [<c129abce>] driver_attach+0x1e/0x20
[   23.938209]  [<c129b160>] ? driver_probe_device+0x1e0/0x1e0
[   23.938218]  [<c129a807>] bus_add_driver+0x167/0x250
[   23.938226]  [<c120ca00>] ? pci_dev_put+0x20/0x20
[   23.938233]  [<c129b7da>] driver_register+0x6a/0x160
[   23.938243]  [<c10bc3b6>] ? get_tracepoint+0x16/0x1b0
[   23.938252]  [<f8762000>] ? 0xf8761fff
[   23.938259]  [<c120ce12>] __pci_register_driver+0x42/0xb0
[   23.938268]  [<f8762000>] ? 0xf8761fff
[   23.938280]  [<f854885d>] drm_pci_init+0xfd/0x110 [drm]
[   23.938289]  [<f8762000>] ? 0xf8761fff
[   23.938313]  [<f876205e>] i915_init+0x5e/0x60 [i915]
[   23.938322]  [<c1001202>] do_one_initcall+0x112/0x160
[   23.938332]  [<c105fd2a>] ? __blocking_notifier_call_chain+0x4a/0x80
[   23.938343]  [<c10978a5>] sys_init_module+0xe35/0x1ae0
[   23.938361]  [<c13cf85f>] sysenter_do_call+0x12/0x28
[   23.938367] Code: e8 d0 ff ff ff 5d c3 8d b4 26 00 00 00 00 8d bc 27 00 00 00 00 55 89 e5 83 ec 0c 89 5d f4 89 75 f8 89 7d fc 3e 8d 74 26 00 89 c3 <8b> 40 08 89 d6 89 cf 8b 00 85 c0 74 64 89 e0 25 00 e0 ff ff f7
[   23.938428] EIP: [<f802e456>] i2c_transfer+0x16/0x90 [i2c_core] SS:ESP 0068:f5673b98
[   23.938441] CR2: 0000000000000008
[   23.938448] ---[ end trace 09eb6dd349b1fa39 ]---
Comment 3 Daniel Vetter 2012-08-28 07:37:31 UTC
Can you please boot with drm.debug=0xe added to your kernel cmdline and attach the full dmesg (up to and including the i915 oops)?
Comment 4 Bjoern Franke 2012-08-28 10:23:35 UTC
Created attachment 78641 [details]
dmesg up to / including the i915 oops
Comment 5 Daniel Vetter 2012-08-28 11:29:44 UTC
This is indeed a very strange bug: We set up SDVOB and can't find anything for SDVOC. But later on we die trying to do a transfer on SDVOC, even though that thing isn't set up and even though we managed to do a successful i2c transfer when probing.

To rule out any stupid timing bugs or issues brough up by other things scribbling over our driver, can you please boot with kms disable (i915.modeset=0), and the reload the i915.ko module after boot manually with kms enable (you need to kill X for that):

modprobe i915 modeset=1
Comment 6 Bjoern Franke 2012-08-28 11:43:46 UTC
Reloading the module works without the issue:
[  146.113408] [drm] Module unloaded
[  174.484317] Linux agpgart interface v0.103
[  174.495753] [drm] Initialized drm 1.1.0 20060810
[  174.499794] ACPI: Video Device [VID] (multi-head: yes  rom: no  post: no)
[  174.501305] input: Video Bus as /devices/LNXSYSTM:00/device:00/PNP0A03:00/LNXVIDEO:00/input/input8
[  174.503208] [Firmware Bug]: Duplicate ACPI video bus devices for the same VGA controller, please try module parameter "video.allow_duplicates=1"if the current driver doesn't work.
[  174.503884] input: Lid Switch as /devices/LNXSYSTM:00/device:00/PNP0C0D:00/input/input9
[  174.506432] ACPI: Lid Switch [LID]
[  174.506520] input: Power Button as /devices/LNXSYSTM:00/device:00/PNP0C0C:00/input/input10
[  174.506574] ACPI: Power Button [PBTN]
[  174.506647] input: Sleep Button as /devices/LNXSYSTM:00/device:00/PNP0C0E:00/input/input11
[  174.506734] ACPI: Sleep Button [SBTN]
[  174.508345] agpgart-intel 0000:00:00.0: Intel 915GM Chipset
[  174.508393] agpgart-intel 0000:00:00.0: detected gtt size: 262144K total, 262144K mappable
[  174.508852] agpgart-intel 0000:00:00.0: detected 8192K stolen memory
[  174.512137] agpgart-intel 0000:00:00.0: AGP aperture is 256M @ 0xc0000000
[  174.538093] i915 0000:00:02.0: setting latency timer to 64
[  174.539122] [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
[  174.539127] [drm] Driver supports precise vblank timestamp query.
[  174.576031] vgaarb: device changed decodes: PCI:0000:00:02.0,olddecodes=io+mem,decodes=io+mem:owns=io+mem
[  174.660169] [drm] GMBUS [i915 gmbus panel] timed out, falling back to bit banging on pin 3
[  174.743506] [drm] GMBUS [i915 gmbus dpb] timed out, falling back to bit banging on pin 5
[  174.800159] [drm] GMBUS [i915 gmbus vga] timed out, falling back to bit banging on pin 2
[  175.362185] [drm] initialized overlay support
[  175.502520] fbcon: inteldrmfb (fb0) is primary device
[  176.222506] Console: switching to colour frame buffer device 128x48
[  176.315189] fb0: inteldrmfb frame buffer device
[  176.315191] drm: registered panic notifier
[  176.315199] [drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0
Comment 7 Daniel Vetter 2012-08-28 12:01:20 UTC
Hm, that smells like something is corrupting memory then :(

Can you please try this patch to use a separate slab for i915 gem objects:

https://patchwork.kernel.org/patch/712051/

Also please use the slub allocator and boot with slub_debug=full, that hopefully catches any rouge writes from other drivers.
Comment 8 Bjoern Franke 2012-08-28 23:08:17 UTC
Running with the slub allocator and slub_debug, i915_gem appears in /sys/kernel/slab, but nothing special in the log :/

SLUB: Genslabs=15, HWalign=64, Order=0-3, MinObjects=0, CPUs=2, N
odes=1
Comment 9 Daniel Vetter 2012-08-29 07:52:01 UTC
Can you also check the patch and running with slub_debug separately? Also, at the same time that i915.ko loads and dies, other drivers load. Can you check whether disabling those works around the issues. From looking at dmesg, the interesting thing going on is the wireless driver b43.
Comment 10 Bjoern Franke 2012-08-29 11:37:20 UTC
I tried slub_debug=FZUP,i915_gem_object now, and I got another exception (see attachment).
Comment 11 Bjoern Franke 2012-08-29 11:37:54 UTC
Created attachment 78721 [details]
new execption
Comment 12 Daniel Vetter 2012-08-29 11:43:11 UTC
(In reply to comment #11)
> Created an attachment (id=78721) [details]
> new execption

Well, that's just the scheduler noticing that a few threads are blocked, since the task that died in the first oops is holding a rather central kms mutex. In other words: Totally expected to happen.
Comment 13 Bjoern Franke 2012-08-29 11:49:56 UTC
Disabling b43 and tg3 has no effect on the issue.
Comment 14 Daniel Vetter 2012-08-29 15:49:16 UTC
Might be a duplicate of bug #46631

Can you please try the little patch from that bug?
Comment 15 Bjoern Franke 2012-08-29 22:52:16 UTC
Unfortunately no fix with this issue on 3.6rc3.
Comment 16 Daniel Vetter 2012-11-09 20:44:19 UTC
Can you please retest this on latest 3.7-rc kernels? If it's still an issue, I guess we need the bisect result to make progress on this here.
Comment 17 Bjoern Franke 2012-11-09 20:57:33 UTC
Still an issue on 3.7rc4 :(
Comment 18 Daniel Vetter 2012-11-09 21:06:40 UTC
Ok, can you please try to bisect where the original issue has been introduced? Also, please attach an updated drm.debug=0xe dmesg up to where the BUG happens.
Comment 19 Bjoern Franke 2012-11-09 21:22:51 UTC
Created attachment 86021 [details]
dmesg with drm.debug / 3.7rc4
Comment 20 Daniel Vetter 2012-11-11 19:16:19 UTC
Created attachment 86091 [details]
testpatch to make intel_sdvo bigger

Please try out what happens when you apply this testpatch on top of any broken kernel (it should apply pretty much everywhere).
Comment 21 Bjoern Franke 2012-11-12 06:28:51 UTC
Created attachment 86151 [details]
dmesg with drm.debug and applied testpatch
Comment 22 Jani Nikula 2012-11-12 16:34:04 UTC
Created attachment 86171 [details]
drm/i915/sdvo: clean up connectors on intel_sdvo_init()  failures

Please try the attached patch.
Comment 23 Bjoern Franke 2012-11-12 19:00:07 UTC
It works! Fine! Thanks!
Comment 24 Daniel Vetter 2012-11-12 19:07:05 UTC
(In reply to comment #23)
> It works! Fine! Thanks!

Just to make sure: Any other bad side-effects like non-working outputs?
Comment 25 Daniel Vetter 2012-11-12 19:12:46 UTC
Fix merged into drm-intel-fixes:

commit d0ddfbd3d1346c1f481ec2289eef350cdba64b42
Author: Jani Nikula <jani.nikula@intel.com>
Date:   Mon Nov 12 18:31:35 2012 +0200

    drm/i915/sdvo: clean up connectors on intel_sdvo_init() failures
Comment 26 Bjoern Franke 2012-11-12 19:32:21 UTC
@Daniel: No side-effects recognized.