Created attachment 107701 [details] This is the dmesg output of the 3.10.10 kernel on my system. On a system that uses the i915 display driver, the kernel reports the memory reserved for the BIOS as corrupted 1 minute after boot-up. If the i915 module is not loaded on boot-up, no corruption is detected. This issue is not caused by faulty RAM sticks. I have ran memtest86+ v4.20 on my system, and after 8 passes, no errors were reported. I first came across this issue when I updated the kernel on my Gentoo system from 3.8.13 to 3.10.7. Since I was using gentoo-sources, I decided to try out upstream's 3.10.10 kernel as well as the latest sources straight from git, and ran into the same issue with both kernels. I will attach the output of dmesg from the 3.10.10 kernel. Steps to Reproduce: Using a kernel >3.8.13: 1.) Configure your kernel. Make sure that your test is being done on a system that needs the i915 driver. Make sure that the i915 driver is built as a module, and make sure you have X86_CHECK_BIOS_CORRUPTION=y . 2.) Boot your system with the new kernel. 3.) Watch your logs (e.g. /var/log/messages, or pipe dmesg output to a log file). See if the kernel reports of any low memory corruption. I have also done a bisect. "git bisect" suggests that the first bad commit was: 95c9608478d639dcffc14ea47b31bff021a99ed1 is the first bad commit commit 95c9608478d639dcffc14ea47b31bff021a99ed1 Author: H. Peter Anvin <hpa@zytor.com> Date: Thu Feb 14 14:02:52 2013 -0800 x86, mm: Move reserving low memory later in initialization Move the reservation of low memory, except for the 4K which actually does belong to the BIOS, later in the initialization; in particular, after we have already reserved the trampoline. The current code locates the trampoline as high as possible, so by deferring the allocation we will still be able to reserve as much memory as is possible. This allows us to run with reservelow=640k without getting a crash on system startup. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Link: http://lkml.kernel.org/n/tip-0y9dqmmsousf69wutxwl3kkf@git.kernel.org :040000 040000 365acf4d8c7e201ff7674dc46f6d5ac3a8b889ae 48081dd511455dddbb93e979f4358449d2533beb M arch If more information is needed from me, please let me know.
This is strange. Have you confirmed the bisect by reverting the offending commit? Also please boot with drm.debug=0xe and attach the complet dmesg so we know what kind of gfx hw you have.
Daniel, I've reverted the commit mentioned in the bisect. The memory corruption bug doesn't happen after the revert, so the bisect is confirmed. This bug is still present in the latest git version of the kernel (currently 3.12+). I will attach the dmesg output of that kernel with the drm.debug=0xe option enabled.
Created attachment 115171 [details] dmesg output of kernel 3.12 with drm.debug=0xe option enabled
Still confused. Another trick to play is to prevent i915 from loading with i915.modeset=0. Then wait for a while to make sure that we'd catch any lowmem corruption. Then stop X and reload i915 for real with # rmmod i915 # modprobe i915 modeset=1 The big question is whether the lowmem corruption happens anyway of whether we need to load the i915 modeset driver (which would point at either bios leftovers or a bug in our takeover sequence).
After blacklisting the module using /etc/modprobe.d/blacklist.conf, setting i915.modeset=0, booting the system and waiting a while, I found that the bug didn't trigger until a few seconds after I loaded the i915 module manually using # modprobe i915 modeset=1 If I set i915.modeset=0, but don't blacklist the module, the low memory corruption happens.
The bisection is bogus, simply because all it does is it makes the detector actually works as advertised. Without the patch, the detector actually misses large swaths of memory, typically at least all memory below 64K (which includes your corruption address.) Either way, there isn't anything to actually *do* here... the BIOS clobbers memory that we normally reserve for exactly the reason that BIOS has a nasty tendency to scribble on low memory. The only thing that is happening is that you have enabled the detector that tells you that did indeed happen. This is probably something you want to fix in your kernel configuration: [ 0.000000] smpboot: 8 Processors exceeds NR_CPUS limit of 4