Most recent kernel where this bug did not occur: n/a Distribution: Debian sid Hardware Environment: AMD athlon 64 X2 dual core with 4Gb RAM, chipset nForce 500 SLI Software Environment: Linux wine.dyndns.org 2.6.23.8-gb506e24f-dirty #9 SMP Tue Dec 11 11:00:48 CET 2007 x86_64 GNU/Linux Problem Description: I'm getting a crash in swsusp_save() on suspend, when it tries to access address 0xffff810008000000 (sorry I don't have the full oops, let me know if you want me to copy it down by hand). This address is apparently the first page in the GART IOMMU range: PCI-DMA: Disabling AGP. PCI-DMA: aperture base @ 8000000 size 65536 KB PCI-DMA: using GART IOMMU. PCI-DMA: Reserving 64MB of IOMMU area in the AGP aperture My guess is that the IOMMU aperture range should somehow be skipped when copying. Suspending works fine if I boot with iommu=soft, or with mem=3G.
So you haven't tested any kernel earlier than 2.6.23? Yes, a copy of the oops would be great, please. You shouldn't need to write it down - try netconsole (Documentation/networking/netconsole.txt). It's worth setting up netconsole...
netconsole won't work at that point (devices suspended, interrupts disabled). Serial console might be useful, though, but I doubt that box has a serial port. I guess the problem is present in all kernels to date. Alexandre, can you attach a dmesg output, please?
Created attachment 13982 [details] dmesg output dmesg output attached. I haven't tried other kernels, if that would be useful I could do it, any version you want me to try? The box does have a serial port but I don't have anything to plug into it I'm afraid.
> netconsole won't work at that point (devices suspended, interrupts > disabled). Serial console might be useful, though, but I doubt that > box has a serial port. > > I guess the problem is present in all kernels to date. at least as long as netconsole output going _into_ suspend goes, i posted some really bad hacks to lkml some time ago that allow a per-device exclusion of the suspend sequence. (the suspend_disabled flag) That way i was able to get a netconsole output far into the suspend, up to the point where we do the ACPI mmio command that physically suspends the CPU. getting output from the system when it is coming out of resume is much harder. (but this crash is about going into the suspend, right?)
(In reply to comment #4) > > getting output from the system when it is coming out of resume is much > harder. (but this crash is about going into the suspend, right?) Yes, but it happens in the middle of the "critical section" in which everything is supposed to be off, except for the CPU executing the code. IOW, it's very much like a resume failure ...
(In reply to comment #3) > Created an attachment (id=13982) [details] > dmesg output > > dmesg output attached. Thanks. > I haven't tried other kernels, if that would be useful I could do it, any > version you want me to try? Hm, it looks like this problem has always been present ... > The box does have a serial port but I don't have anything to plug into it I'm > afraid. I see. Well, it seems that the IOMMU driver should mark the aperture as "nosave" for us (it overlaps with a memory area that the image-creating code considers as useable). Did you try to enable the IOMMU option in the BIOS setup, BTW?
(In reply to comment #6) > Did you try to enable the IOMMU option in the BIOS setup, BTW? There doesn't seem to be any way to configure IOMMU in my BIOS setup, or if there is one I couldn't find it... It's an ASUS M2N-E SLI, chipset nForce 500.
OK, thanks. Probably Asus doesn't think you'd need that. Unfortunately, I'm not familiar with the IOMMU handling code, so I'm afraid it'll take some time to come up with a fix ...
hmmm, what if you boot with iommu=off?
Actually I retested and the bug is fixed now, most likely by commit 2050d45d7c32cbad7a070d04256237144a0920db.
commit 2050d45d7c32cbad7a070d04256237144a0920db Author: Pavel Machek <pavel@ucw.cz> Date: Thu Mar 13 23:05:41 2008 +0100 x86: fix long standing bug with usb after hibernation with 4GB ram shipped in 2.6.25-rc7 closed