Hibernate / Thaw screen is black with artifacts on thaw. Sometimes a ABRT is shown after forced reboot. Forcing a Sync or Crash or Boot with SYSRQ does not work. Machine is not pingable. Version-Release number of selected component (if applicable): 2.6.34.1-9 How reproducible: All the time Steps to Reproduce: 1. Hibernate 2. Thaw 3. Actual results: Machine hang. Expected results: Normal thaw Additional info: Smolt: http://www.smolts.org/client/show/pub_98f6cfac-8cad-4a3d-a099-e2e2854e64c0 xorg.conf is not present BUG: unable to handle kernel paging request at a5e89046 IP: [<f7f8ec27>] drm_mode_getconnector+0x295/0x2b9 [drm] *pdpt = 0000000036091001 *pde = 0000000000000000 Oops: 0002 [#1] SMP last sysfs file: /sys/devices/pci0000:00/0000:00:1c.0/0000:0b:00.0/ssb0:0/ieee80211/phy0/rfkill1/uevent Modules linked in: aes_i586 aes_generic coretemp ipv6 cpufreq_ondemand acpi_cpufreq fuse uinput arc4 snd_hda_codec_idt ecb snd_hda_intel snd_hda_codec b43 snd_hwdep snd_seq snd_seq_device snd_pcm mac80211 snd_timer cfg80211 snd b44 ssb dell_laptop soundcore dell_wmi i2c_i801 iTCO_wdt snd_page_alloc iTCO_vendor_support rfkill wmi mii sdhci_pci sdhci mmc_core joydev microcode dcdbas firewire_ohci firewire_core crc_itu_t i915 drm_kms_helper drm i2c_algo_bit i2c_core video output [last unloaded: kvm] Pid: 1516, comm: Xorg Not tainted 2.6.34.1-9.fc13.i686.PAE #1 0KD882/MM061 EIP: 0060:[<f7f8ec27>] EFLAGS: 00013293 CPU: 1 EIP is at drm_mode_getconnector+0x295/0x2b9 [drm] EAX: f36d313b EBX: 00000001 ECX: 00000003 EDX: f36d3e7c ESI: f69d4000 EDI: f8032b74 EBP: f36d3e60 ESP: f36d3dec DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 Process Xorg (pid: 1516, ti=f36d2000 task=f3df5940 task.ti=f36d2000) Stack: 000000d0 f69d7688 000a3e0c f69d4154 0000033b 00000003 00000001 f36d3e7c 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Call Trace: [<f7f85ba3>] ? drm_ioctl+0x23c/0x31d [drm] [<f7f8e992>] ? drm_mode_getconnector+0x0/0x2b9 [drm] [<c04c64a7>] ? get_swap_bio+0x3b/0x6b [<c057f0e8>] ? file_has_perm+0x8c/0xa6 [<c04c64a7>] ? get_swap_bio+0x3b/0x6b [<c04e485d>] ? vfs_ioctl+0x2c/0x96 [<f7f85967>] ? drm_ioctl+0x0/0x31d [drm] [<c04e4df3>] ? do_vfs_ioctl+0x488/0x4c6 [<c04c64a7>] ? get_swap_bio+0x3b/0x6b [<c057f38c>] ? selinux_file_ioctl+0x43/0x46 [<c04c64a7>] ? get_swap_bio+0x3b/0x6b [<c04e4e77>] ? sys_ioctl+0x46/0x66 [<c04c64a7>] ? get_swap_bio+0x3b/0x6b [<c0408cdf>] ? sysenter_do_call+0x12/0x28 [<c04c64a7>] ? get_swap_bio+0x3b/0x6b [<c04c64a7>] ? get_swap_bio+0x3b/0x6b Code: 00 74 17 e8 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 28 eb 05 bf f2 ff ff <ff> 8b 45 90 e8 a5 1f 81 c8 89 f8 8b 55 f0 65 33 15 14 00 00 00 EIP: [<f7f8ec27>] drm_mode_getconnector+0x295/0x2b9 [drm] SS:ESP 0068:f36d3dec CR2: 00000000a5e89046
Thanks. Was 2.6.33 OK? 2.6.32?
Here's the honest answer -short version: I only ran 2.6.33 for a couple of days after the fix for KBZ 13811 was out, and I did see it there. It's really hard to tell because KBZ 13811 (which really was regression but was not marked nor treated like one) masked this problem. I was running a 2.6.32.8 from F11 on F12 before that because that one particular kernel got me ~5 working hibernate/thaw cycles, and didn't notice THIS issue. -Long -version: (Sorry if it is ranty -- I know it's not your fault!) Before I upgraded to F13 and 2.6.33 series, I applied the patch for KBZ 13811 to 2.6.32.14 and 2.6.32.16 and did see the problem there. I was however running 2.6.32.8 for the last 6 months and did not notice it, but then 13811 would usually strike first, but the 2.6.32.8 I was running would usually let me get ~5 hibernate/thaw cycles before dying, so I can't really say for sure. Ever since Fedora put/required KMS into Fedora 11, the kernel has been in a regression since at least 2.6.29. I have been using in kernel hibernate/thaw (pmdisk/swsusp) since I think about 2.6.9 or 2.6.10, or about the time that pmdisk and swsusp "stuff" was big. I used to build my own kernels to configure that in as Fedora's kernels at the time didn't include it -- and generally (with a few hiccups here and there) it worked until the 2.6.29.4 that Fedora shipped in F-11. The last kernel that just plain worked was 2.6.27.44 as shipped in the last update of Fedora 10. The entire KMS/GEM project has been at least for me nothing but a regression, since the mode-switch blink when switching to X didn't bother me, and I've lost the ability to hibernate / thaw my laptop reliably for the past year plus. KBZ 13811, regardless of how it was marked, was a regression, since before KMS as Fedora had it in 2.6.29 swsup worked, after it didn't. This may more be a virgin bug in KMS/gem, but the overall impact is regression I'd try anything from 2.6.35, but does that work the libdrm/mesa xorg-intel driver that Fedora is shipping for F13? That's rhetorical, but reflective of the uncertainty and doubt about a whether or not you can use Fedora as a base and have a workable system at least with Intel graphics. Thanks!
On Friday, July 23, 2010, Jesse Barnes wrote: > On Fri, 23 Jul 2010 14:15:55 +0200 (CEST) > "Rafael J. Wysocki" <rjw@sisk.pl> wrote: > > > This message has been generated automatically as a part of a report > > of regressions introduced between 2.6.33 and 2.6.34. > > > > The following bug entry is on the current list of known regressions > > introduced between 2.6.33 and 2.6.34. Please verify if it still should > > be listed and let the tracking team know (either way). > > > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=16388 > > Subject : i915 drm BUG: unable to handle kernel paging request > at a5e89046 > > Submitter : <lists@clanduggan.org> > > Date : 2010-07-14 16:59 (10 days old) > > Looks like some potential memory corruption? At resume we try to get > connector info but panic due to a bad pointer, maybe in one of the > lists. Can you gdb your drm_kms_helper module and do "list > *drm_mode_getconnector+0x295" to see what line this is? > > Also, what chipset do you have? Maybe I can reproduce it here with > your kernel config.
This is probably due to the i915 hibernation memory corruption bug, and should be fixed by: commit 985b823b919273fe1327d56d2196b4f92e5d0fae drm/i915: fix hibernation since i915 self-reclaim fixes commit cd9f040df6ce46573760a507cb88192d05d27d86 drm/i915: add 'reclaimable' to i915 self-reclaimable page allocations And yes, those are in Fedora now.
And it looks like those two are needed in 2.6.32-stable, since the patch that caused the bug went in 2.6.32.8 as drm-i915-selectively-enable-self-reclaim.patch
Created attachment 27261 [details] Config per jbarnes Sorry for the delay. On Fri, 23 Jul 2010 10:37:12 -0700, Jesse Barnes <jbarnes@virtuousgeek.org> wrote: > > Looks like some potential memory corruption? At resume we try to get > connector info but panic due to a bad pointer, maybe in one of the > lists. Can you gdb your drm_kms_helper module and do "list > *drm_mode_getconnector+0x295" to see what line this is? > (gdb) list *drm_mode_getconnector+0x295 0x20f3 is in drm_mode_getconnector (drivers/gpu/drm/drm_crtc.c:1417). 1412 } 1413 copied++; 1414 } 1415 } 1416 } 1417 out_resp->count_encoders = encoders_count; 1418 1419 out: 1420 mutex_unlock(&dev->mode_config.mutex); 1421 return ret; > Also, what chipset do you have? Maybe I can reproduce it here with > your kernel config. 0:00:02.0 VGA compatible controller: Intel Corporation Mobile 945GM/GMS, 943/940GML Express Integrated Graphics Contr oller (rev 03) (prog-if 00 [VGA controller]) Subsystem: Dell Device 01bd Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 16 Region 0: Memory at eff00000 (32-bit, non-prefetchable) [size=512K] Region 1: I/O ports at eff8 [size=8] Region 2: Memory at d0000000 (32-bit, prefetchable) [size=256M] Region 3: Memory at efec0000 (32-bit, non-prefetchable) [size=256K] Expansion ROM at <unassigned> [disabled] Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit- Address: 00000000 Data: 0000 Capabilities: [d0] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Kernel driver in use: i915 Kernel modules: i915 0000:00:02.1 Display controller: Intel Corporation Mobile 945GM/GMS/GME, 943/940GML Express Integrated Graphics Controller (rev 03) Subsystem: Dell Device 01bd Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Region 0: Memory at eff80000 (32-bit, non-prefetchable) [size=512K] Capabilities: [d0] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Weird death following resume. It's either the write to a bit of memory we have just allocated for the ioctl, or the connector is corrupt. Definitely fits the pattern for the i915 hibernation bug.