Bug 46381
Summary: | [i915GM] Null pointer dereference in the sdvo i2c stuff | ||
---|---|---|---|
Product: | Drivers | Reporter: | Bjoern Franke (bjo) |
Component: | Video(DRI - Intel) | Assignee: | drivers_video-dri-intel (drivers_video-dri-intel) |
Status: | RESOLVED PATCH_ALREADY_AVAILABLE | ||
Severity: | normal | CC: | bgamari, daniel, florian, thomas |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 3.5.2 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
dmesg up to / including the i915 oops
new execption dmesg with drm.debug / 3.7rc4 testpatch to make intel_sdvo bigger dmesg with drm.debug and applied testpatch drm/i915/sdvo: clean up connectors on intel_sdvo_init() failures |
Description
Bjoern Franke
2012-08-23 16:45:56 UTC
Believed to be fixed with: commit cee25168e9c4ef7f9417632af2dc78b8521dfda7 Author: Jani Nikula <jani.nikula@intel.com> Date: Mon Aug 13 17:33:02 2012 +0300 drm/i915: ensure i2c adapter is all set before adding it Which is part of 3.6-rc3. If this is not the case, please reopen the bug report, thanks. No, with 3.6rc3 it still crashes sometimes. But from time to time, it boots and than it runs into another issue which has to do with wq_worker. [ 23.937533] BUG: unable to handle kernel NULL pointer dereference at 00000008 [ 23.937549] IP: [<f802e456>] i2c_transfer+0x16/0x90 [i2c_core] [ 23.937565] *pde = 00000000 [ 23.937571] Oops: 0000 [#1] PREEMPT SMP [ 23.937579] Modules linked in: b43 snd_intel8x0(+) bcma snd_intel8x0m(+) snd_ac97_codec mac80211 i915(+) ac97_bus cfg80211 rfkill snd_pcm ssb tg3 i2c_algo_bit drm_kms_helper snd_page_alloc drm i2c_i801 snd_timer mmc_core snd joydev pcmcia soundcore iTCO_wdt yenta_socket irda dell_laptop libphy i2c_core gpio_ich iTCO_vendor_support intel_agp intel_gtt acpi_cpufreq lpc_ich mperf serio_raw psmouse pcmcia_rsrc processor pcmcia_core agpgart microcode crc_ccitt pcspkr thermal dcdbas video button evdev battery ac nfs lockd sunrpc fscache i8k capi kernelcapi autofs4 ext4 crc16 jbd2 mbcache aes_i586 ablk_helper cryptd aes_generic xts gf128mul dm_crypt dm_mod sd_mod pata_acpi ata_generic ata_piix libata scsi_mod uhci_hcd ehci_hcd usbcore usb_common [ 23.937702] Pid: 184, comm: systemd-udevd Not tainted 3.6.0-rc3-mainline #1 Dell Inc. Latitude D410 /0H8384 [ 23.937714] EIP: 0060:[<f802e456>] EFLAGS: 00010292 CPU: 0 [ 23.937723] EIP is at i2c_transfer+0x16/0x90 [i2c_core] [ 23.937730] EAX: 00000000 EBX: 00000000 ECX: 00000003 EDX: f54cd580 [ 23.937737] ESI: f54cd598 EDI: f4de6800 EBP: f5673ba4 ESP: f5673b98 [ 23.937744] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 [ 23.937750] CR0: 8005003b CR2: 00000008 CR3: 3558c000 CR4: 000007d0 [ 23.937758] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 [ 23.937764] DR6: ffff0ff0 DR7: 00000400 [ 23.937770] Process systemd-udevd (pid: 184, ti=f5672000 task=f545b910 task.ti=f5672000) [ 23.937779] Stack: [ 23.937783] 00000001 f54cd598 f4de6800 f5673bf0 f89f6120 00000004 00000000 00000000 [ 23.937798] f8a1f5c7 f8a1b0d8 0000000b f5673c08 0000000b 0b000001 00000003 f54cd580 [ 23.937812] f5577780 0000000c 09670001 f4de6800 f5f99000 f4e7561c f5673c30 f89f7131 [ 23.937827] Call Trace: [ 23.937871] [<f89f6120>] intel_sdvo_write_cmd+0x2a0/0x3a0 [i915] [ 23.937904] [<f89f7131>] intel_sdvo_detect+0x31/0x2e0 [i915] [ 23.937916] [<c11f002a>] ? snprintf+0x1a/0x20 [ 23.937930] [<f854baf5>] ? drm_get_connector_name+0x45/0x50 [drm] [ 23.937942] [<f85a1018>] drm_helper_probe_single_connector_modes+0x198/0x320 [drm_kms_helper] [ 23.937955] [<f859ede1>] drm_fb_helper_probe_connector_modes.isra.2+0x41/0x60 [drm_kms_helper] [ 23.937967] [<f859fe5c>] drm_fb_helper_initial_config+0x16c/0x1f0 [drm_kms_helper] [ 23.937979] [<c1128e62>] ? kmem_cache_alloc_trace+0x112/0x120 [ 23.937987] [<c112901d>] ? __kmalloc+0x12d/0x160 [ 23.937994] [<c1128e62>] ? kmem_cache_alloc_trace+0x112/0x120 [ 23.938004] [<f859ee37>] ? drm_fb_helper_single_add_all_connectors+0x37/0xc0 [drm_kms_helper] [ 23.938040] [<f8a01c96>] intel_fbdev_init+0x76/0xb0 [i915] [ 23.938066] [<f89c64ff>] i915_driver_load+0x9bf/0xb00 [i915] [ 23.938091] [<f89c41d0>] ? i915_switcheroo_set_state+0xa0/0xa0 [i915] [ 23.938107] [<f854863b>] drm_get_pci_dev+0x13b/0x260 [drm] [ 23.938138] [<f8a0ccd7>] i915_pci_probe+0x4b/0x55 [i915] [ 23.938148] [<c120cb87>] pci_device_probe+0x87/0x110 [ 23.938158] [<c1190e97>] ? sysfs_create_link+0x17/0x20 [ 23.938169] [<c129afdc>] driver_probe_device+0x5c/0x1e0 [ 23.938177] [<c129b1f1>] __driver_attach+0x91/0xa0 [ 23.938185] [<c129b160>] ? driver_probe_device+0x1e0/0x1e0 [ 23.938194] [<c12997a2>] bus_for_each_dev+0x42/0x80 [ 23.938202] [<c129abce>] driver_attach+0x1e/0x20 [ 23.938209] [<c129b160>] ? driver_probe_device+0x1e0/0x1e0 [ 23.938218] [<c129a807>] bus_add_driver+0x167/0x250 [ 23.938226] [<c120ca00>] ? pci_dev_put+0x20/0x20 [ 23.938233] [<c129b7da>] driver_register+0x6a/0x160 [ 23.938243] [<c10bc3b6>] ? get_tracepoint+0x16/0x1b0 [ 23.938252] [<f8762000>] ? 0xf8761fff [ 23.938259] [<c120ce12>] __pci_register_driver+0x42/0xb0 [ 23.938268] [<f8762000>] ? 0xf8761fff [ 23.938280] [<f854885d>] drm_pci_init+0xfd/0x110 [drm] [ 23.938289] [<f8762000>] ? 0xf8761fff [ 23.938313] [<f876205e>] i915_init+0x5e/0x60 [i915] [ 23.938322] [<c1001202>] do_one_initcall+0x112/0x160 [ 23.938332] [<c105fd2a>] ? __blocking_notifier_call_chain+0x4a/0x80 [ 23.938343] [<c10978a5>] sys_init_module+0xe35/0x1ae0 [ 23.938361] [<c13cf85f>] sysenter_do_call+0x12/0x28 [ 23.938367] Code: e8 d0 ff ff ff 5d c3 8d b4 26 00 00 00 00 8d bc 27 00 00 00 00 55 89 e5 83 ec 0c 89 5d f4 89 75 f8 89 7d fc 3e 8d 74 26 00 89 c3 <8b> 40 08 89 d6 89 cf 8b 00 85 c0 74 64 89 e0 25 00 e0 ff ff f7 [ 23.938428] EIP: [<f802e456>] i2c_transfer+0x16/0x90 [i2c_core] SS:ESP 0068:f5673b98 [ 23.938441] CR2: 0000000000000008 [ 23.938448] ---[ end trace 09eb6dd349b1fa39 ]--- Can you please boot with drm.debug=0xe added to your kernel cmdline and attach the full dmesg (up to and including the i915 oops)? Created attachment 78641 [details]
dmesg up to / including the i915 oops
This is indeed a very strange bug: We set up SDVOB and can't find anything for SDVOC. But later on we die trying to do a transfer on SDVOC, even though that thing isn't set up and even though we managed to do a successful i2c transfer when probing. To rule out any stupid timing bugs or issues brough up by other things scribbling over our driver, can you please boot with kms disable (i915.modeset=0), and the reload the i915.ko module after boot manually with kms enable (you need to kill X for that): modprobe i915 modeset=1 Reloading the module works without the issue: [ 146.113408] [drm] Module unloaded [ 174.484317] Linux agpgart interface v0.103 [ 174.495753] [drm] Initialized drm 1.1.0 20060810 [ 174.499794] ACPI: Video Device [VID] (multi-head: yes rom: no post: no) [ 174.501305] input: Video Bus as /devices/LNXSYSTM:00/device:00/PNP0A03:00/LNXVIDEO:00/input/input8 [ 174.503208] [Firmware Bug]: Duplicate ACPI video bus devices for the same VGA controller, please try module parameter "video.allow_duplicates=1"if the current driver doesn't work. [ 174.503884] input: Lid Switch as /devices/LNXSYSTM:00/device:00/PNP0C0D:00/input/input9 [ 174.506432] ACPI: Lid Switch [LID] [ 174.506520] input: Power Button as /devices/LNXSYSTM:00/device:00/PNP0C0C:00/input/input10 [ 174.506574] ACPI: Power Button [PBTN] [ 174.506647] input: Sleep Button as /devices/LNXSYSTM:00/device:00/PNP0C0E:00/input/input11 [ 174.506734] ACPI: Sleep Button [SBTN] [ 174.508345] agpgart-intel 0000:00:00.0: Intel 915GM Chipset [ 174.508393] agpgart-intel 0000:00:00.0: detected gtt size: 262144K total, 262144K mappable [ 174.508852] agpgart-intel 0000:00:00.0: detected 8192K stolen memory [ 174.512137] agpgart-intel 0000:00:00.0: AGP aperture is 256M @ 0xc0000000 [ 174.538093] i915 0000:00:02.0: setting latency timer to 64 [ 174.539122] [drm] Supports vblank timestamp caching Rev 1 (10.10.2010). [ 174.539127] [drm] Driver supports precise vblank timestamp query. [ 174.576031] vgaarb: device changed decodes: PCI:0000:00:02.0,olddecodes=io+mem,decodes=io+mem:owns=io+mem [ 174.660169] [drm] GMBUS [i915 gmbus panel] timed out, falling back to bit banging on pin 3 [ 174.743506] [drm] GMBUS [i915 gmbus dpb] timed out, falling back to bit banging on pin 5 [ 174.800159] [drm] GMBUS [i915 gmbus vga] timed out, falling back to bit banging on pin 2 [ 175.362185] [drm] initialized overlay support [ 175.502520] fbcon: inteldrmfb (fb0) is primary device [ 176.222506] Console: switching to colour frame buffer device 128x48 [ 176.315189] fb0: inteldrmfb frame buffer device [ 176.315191] drm: registered panic notifier [ 176.315199] [drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0 Hm, that smells like something is corrupting memory then :( Can you please try this patch to use a separate slab for i915 gem objects: https://patchwork.kernel.org/patch/712051/ Also please use the slub allocator and boot with slub_debug=full, that hopefully catches any rouge writes from other drivers. Running with the slub allocator and slub_debug, i915_gem appears in /sys/kernel/slab, but nothing special in the log :/ SLUB: Genslabs=15, HWalign=64, Order=0-3, MinObjects=0, CPUs=2, N odes=1 Can you also check the patch and running with slub_debug separately? Also, at the same time that i915.ko loads and dies, other drivers load. Can you check whether disabling those works around the issues. From looking at dmesg, the interesting thing going on is the wireless driver b43. I tried slub_debug=FZUP,i915_gem_object now, and I got another exception (see attachment). Created attachment 78721 [details]
new execption
(In reply to comment #11) > Created an attachment (id=78721) [details] > new execption Well, that's just the scheduler noticing that a few threads are blocked, since the task that died in the first oops is holding a rather central kms mutex. In other words: Totally expected to happen. Disabling b43 and tg3 has no effect on the issue. Might be a duplicate of bug #46631 Can you please try the little patch from that bug? Unfortunately no fix with this issue on 3.6rc3. Can you please retest this on latest 3.7-rc kernels? If it's still an issue, I guess we need the bisect result to make progress on this here. Still an issue on 3.7rc4 :( Ok, can you please try to bisect where the original issue has been introduced? Also, please attach an updated drm.debug=0xe dmesg up to where the BUG happens. Created attachment 86021 [details]
dmesg with drm.debug / 3.7rc4
Created attachment 86091 [details]
testpatch to make intel_sdvo bigger
Please try out what happens when you apply this testpatch on top of any broken kernel (it should apply pretty much everywhere).
Created attachment 86151 [details]
dmesg with drm.debug and applied testpatch
Created attachment 86171 [details]
drm/i915/sdvo: clean up connectors on intel_sdvo_init() failures
Please try the attached patch.
It works! Fine! Thanks! (In reply to comment #23) > It works! Fine! Thanks! Just to make sure: Any other bad side-effects like non-working outputs? Fix merged into drm-intel-fixes: commit d0ddfbd3d1346c1f481ec2289eef350cdba64b42 Author: Jani Nikula <jani.nikula@intel.com> Date: Mon Nov 12 18:31:35 2012 +0200 drm/i915/sdvo: clean up connectors on intel_sdvo_init() failures @Daniel: No side-effects recognized. |