|Summary:||Kernel oops in swsusp_save on suspend (IOMMU-related)|
|Product:||Power Management||Reporter:||Alexandre Julliard (julliard)|
|Component:||Hibernation/Suspend||Assignee:||Rafael J. Wysocki (rjwysocki)|
|Severity:||normal||CC:||acpi-bugzilla, akpm, andi-bz, lenb, mingo, nigel, rjw, tglx|
|Bug Depends on:|
Description Alexandre Julliard 2007-12-11 05:30:33 UTC
Most recent kernel where this bug did not occur: n/a Distribution: Debian sid Hardware Environment: AMD athlon 64 X2 dual core with 4Gb RAM, chipset nForce 500 SLI Software Environment: Linux wine.dyndns.org 126.96.36.199-gb506e24f-dirty #9 SMP Tue Dec 11 11:00:48 CET 2007 x86_64 GNU/Linux Problem Description: I'm getting a crash in swsusp_save() on suspend, when it tries to access address 0xffff810008000000 (sorry I don't have the full oops, let me know if you want me to copy it down by hand). This address is apparently the first page in the GART IOMMU range: PCI-DMA: Disabling AGP. PCI-DMA: aperture base @ 8000000 size 65536 KB PCI-DMA: using GART IOMMU. PCI-DMA: Reserving 64MB of IOMMU area in the AGP aperture My guess is that the IOMMU aperture range should somehow be skipped when copying. Suspending works fine if I boot with iommu=soft, or with mem=3G.
Comment 1 Andrew Morton 2007-12-11 12:14:00 UTC
So you haven't tested any kernel earlier than 2.6.23? Yes, a copy of the oops would be great, please. You shouldn't need to write it down - try netconsole (Documentation/networking/netconsole.txt). It's worth setting up netconsole...
Comment 2 Rafael J. Wysocki 2007-12-11 13:38:36 UTC
netconsole won't work at that point (devices suspended, interrupts disabled). Serial console might be useful, though, but I doubt that box has a serial port. I guess the problem is present in all kernels to date. Alexandre, can you attach a dmesg output, please?
Comment 3 Alexandre Julliard 2007-12-11 14:10:13 UTC
Created attachment 13982 [details] dmesg output dmesg output attached. I haven't tried other kernels, if that would be useful I could do it, any version you want me to try? The box does have a serial port but I don't have anything to plug into it I'm afraid.
Comment 4 Ingo Molnar 2007-12-12 02:15:47 UTC
> netconsole won't work at that point (devices suspended, interrupts > disabled). Serial console might be useful, though, but I doubt that > box has a serial port. > > I guess the problem is present in all kernels to date. at least as long as netconsole output going _into_ suspend goes, i posted some really bad hacks to lkml some time ago that allow a per-device exclusion of the suspend sequence. (the suspend_disabled flag) That way i was able to get a netconsole output far into the suspend, up to the point where we do the ACPI mmio command that physically suspends the CPU. getting output from the system when it is coming out of resume is much harder. (but this crash is about going into the suspend, right?)
Comment 5 Rafael J. Wysocki 2007-12-12 15:30:32 UTC
(In reply to comment #4) > > getting output from the system when it is coming out of resume is much > harder. (but this crash is about going into the suspend, right?) Yes, but it happens in the middle of the "critical section" in which everything is supposed to be off, except for the CPU executing the code. IOW, it's very much like a resume failure ...
Comment 6 Rafael J. Wysocki 2007-12-12 15:38:57 UTC
(In reply to comment #3) > Created an attachment (id=13982) [details] > dmesg output > > dmesg output attached. Thanks. > I haven't tried other kernels, if that would be useful I could do it, any > version you want me to try? Hm, it looks like this problem has always been present ... > The box does have a serial port but I don't have anything to plug into it I'm > afraid. I see. Well, it seems that the IOMMU driver should mark the aperture as "nosave" for us (it overlaps with a memory area that the image-creating code considers as useable). Did you try to enable the IOMMU option in the BIOS setup, BTW?
Comment 7 Alexandre Julliard 2007-12-13 01:41:16 UTC
(In reply to comment #6) > Did you try to enable the IOMMU option in the BIOS setup, BTW? There doesn't seem to be any way to configure IOMMU in my BIOS setup, or if there is one I couldn't find it... It's an ASUS M2N-E SLI, chipset nForce 500.
Comment 8 Rafael J. Wysocki 2007-12-13 07:28:32 UTC
OK, thanks. Probably Asus doesn't think you'd need that. Unfortunately, I'm not familiar with the IOMMU handling code, so I'm afraid it'll take some time to come up with a fix ...
Comment 9 Zhang Rui 2008-11-19 23:21:54 UTC
hmmm, what if you boot with iommu=off?
Comment 10 Alexandre Julliard 2008-11-22 06:57:58 UTC
Actually I retested and the bug is fixed now, most likely by commit 2050d45d7c32cbad7a070d04256237144a0920db.