Bug 11237 - corrupt PMD after resume
corrupt PMD after resume
Status: CLOSED CODE_FIX
Product: Platform Specific/Hardware
Classification: Unclassified
Component: x86-64
All Linux
: P1 normal
Assigned To: platform_x86_64@kernel-bugs.osdl.org
:
: 11313 (view as bug list)
Depends on:
Blocks: 7216
  Show dependency treegraph
 
Reported: 2008-08-02 14:04 UTC by Rafael J. Wysocki
Modified: 2008-12-20 04:02 UTC (History)
9 users (show)

See Also:
Kernel Version: 2.6.27-rc1
Tree: Mainline
Regression: ---


Attachments
2.6.26-rc9-00681-g1a98fd1 boot messages (50.43 KB, text/plain)
2008-08-05 12:45 UTC, Alan Jenkins
Details
Kernel log 2.6.27-rc3 init=/bin/bash (40.94 KB, text/plain)
2008-08-13 04:58 UTC, Alan Jenkins
Details
Original BUG sequence (first one in ext3) (89.43 KB, text/plain)
2008-08-14 01:00 UTC, Alan Jenkins
Details
BUG calltrace 1 (18.07 KB, text/plain)
2008-08-14 01:15 UTC, Alan Jenkins
Details
Bug calltrace 2 (19.09 KB, text/plain)
2008-08-14 01:15 UTC, Alan Jenkins
Details
Bug calltrace 3 (47.04 KB, text/plain)
2008-08-14 01:16 UTC, Alan Jenkins
Details
Bug calltrace 4 (23.51 KB, text/plain)
2008-08-14 01:16 UTC, Alan Jenkins
Details
Bug calltrace 5 (22.63 KB, text/plain)
2008-08-14 01:16 UTC, Alan Jenkins
Details
Print info on initial kernel mappings (913 bytes, patch)
2008-08-20 08:58 UTC, Jeremy Fitzhardinge
Details | Diff
dmesg with patch applied (59.32 KB, text/plain)
2008-08-20 09:33 UTC, Alan Jenkins
Details
Page tables from 2.6.27-rc3 (76.49 KB, text/plain)
2008-08-21 02:06 UTC, Alan Jenkins
Details
2.6.26 dmesg (32.66 KB, text/plain)
2008-08-21 12:25 UTC, Alan Jenkins
Details
"last known good" dmesg (33.51 KB, text/plain)
2008-08-22 01:36 UTC, Alan Jenkins
Details
dmesg from "bad" kernel (56.27 KB, text/plain)
2008-08-23 01:09 UTC, Alan Jenkins
Details
dmidecode on faulty system (11.73 KB, text/plain)
2008-08-28 09:00 UTC, Alan Jenkins
Details
dmidecode (5.06 KB, text/plain)
2008-08-28 12:55 UTC, Rafał Miłecki
Details
Proposed patch for mainline to workaround this problem and detect other instances of corruption (6.04 KB, patch)
2008-08-28 12:59 UTC, Jeremy Fitzhardinge
Details | Diff
dmesg | grep -i corrup (245.28 KB, text/plain)
2008-08-29 03:01 UTC, Rafał Miłecki
Details
dmesg 2.6.26 suspend/resume (23.91 KB, text/plain)
2008-09-12 15:46 UTC, Andy Wettstein
Details
dmesg 2.6.27-rc5 bios corruption (122.87 KB, text/plain)
2008-09-12 21:44 UTC, Andy Wettstein
Details
corrected dmesg 2.6.27-rc6 bios corruption (48.28 KB, text/plain)
2008-09-14 19:06 UTC, Andy Wettstein
Details
msi dmidecode (6.43 KB, text/plain)
2008-09-15 15:01 UTC, Andy Wettstein
Details
dmesg from tip (56.37 KB, text/plain)
2008-09-15 15:03 UTC, Andy Wettstein
Details
dmesg with CONFIG_X86_RESERVE_LOW_64K=y (35.18 KB, text/plain)
2008-09-17 07:36 UTC, Andy Wettstein
Details
dmesg from tip (commit 74546a8cd9a4e2...) (37.19 KB, text/plain)
2008-09-19 06:59 UTC, Rafał Miłecki
Details
photos of messages after typing "init 5" (478.12 KB, image/jpeg)
2008-09-19 07:17 UTC, Rafał Miłecki
Details
dmesg from tip (commit 7f4da50cdcac...) (37.33 KB, text/plain)
2008-09-26 12:01 UTC, Rafał Miłecki
Details

Description Rafael J. Wysocki 2008-08-02 14:04:47 UTC
Subject    : [BUG] 2.6.27-rc1 in ext3_find_entry
Submitter  : Alan Jenkins <alan-jenkins@tuffmail.co.uk>
Date       : 2008-08-02 9:51
References : http://marc.info/?l=linux-kernel&m=121767073424952&w=4
Handled-By : Hugh Dickins <hugh@veritas.com>

This entry is being used for tracking a regression from 2.6.26.  Please don't
close it until the problem is fixed in the mainline.
Comment 1 Alan Jenkins 2008-08-03 03:58:23 UTC
Hugh: Thanks for your suggestions for debugging the corrupt PMD entries, but they're a bit scary for me :-).  I'll try them if I can't get anything better.

I read basic-pm-debugging.txt and tried

echo core > /sys/power/pm_test # Do everything except the actual suspend
echo mem > /sys/power/state

and also a complete suspend to disk.  But the BUG only seems to trigger with a complete suspend to ram.

It might still be a driver bug though.  I'm now using a statically linked version of s2ram, so I should be able to try it from the initramfs before drivers are loaded.  (I just need to steal a PS/2 keyboard :-).
Comment 2 Alan Jenkins 2008-08-03 04:56:47 UTC
I ran s2ram from my Ubuntu initramfs, using a "break=premount" boot option.  I checked and the only modules which had been loaded were related to software RAID.  I also checked the contents of /sys/bus/pci/drivers.  There were only two pci drivers, "serial" and "pcieport-driver".  I don't use a framebuffer driver either.  So that just leaves ACPI/pnp/platform/system devices...

I didn't get any BUGs printed to the console, so I continued the boot process and logged into my KDE session.  Previously, I found that after booting and suspending, I had to log in, check my email etc. to reproduce the problem.  I did the same thing and this time it paniced (locked up with flashing keyboard LEDs) before I could run "dmesg" to check for the BUGs.
Comment 3 Alan Jenkins 2008-08-04 06:41:27 UTC
I've bisected it to somewhere between v2.6.26-rc5-213-g1eede07 and v2.6.26-rc9-696-g329513a.  Unfortunately this seems to be a big hole of 132 commits worth of badness.  

So far I've found a couple of different build failures, a BUG that kills init during early boot, a panic just after "Kernel really alive"... nothing that works well enough to test suspend.
Comment 4 Alan Jenkins 2008-08-05 10:17:47 UTC
I've bisected this down further to v2.6.26-rc9-547-ga939098..v2.6.26-rc9-615-gd86623a.  These are all x86 arch commits, mostly unification.  They are non-bisectable due to build breakage.

Looking the oneline commit log, there's nothing that directly mentions suspend to ram.  However there must be something in here that's breaking it, causing it to resume with these invalid page... thingies.  IIRC, the resume path re-uses some code (maybe including assembler?) from the boot path.  So something in here happens to work for boot but not suspend.

BTW I found running gitk is a great way to reproduce this BUG after resume.
Comment 5 Alan Jenkins 2008-08-05 10:21:21 UTC
Please change this bug to Platform/x86_64, to alert the appropriate people(s).
Comment 6 Hugh Dickins 2008-08-05 11:19:14 UTC
You seem to be making useful progress, thanks for your efforts.
Later tonight or tomorrow I'll look for clues in the output of
git diff -u v2.6.26-rc9-547-ga939098 v2.6.26-rc9-615-gd86623a

I notice there's a fair number of max_pfn patches in there:
so rather than ask you what your max_pfn is, or how much RAM
you have, please may I ask you to put the early part of your
bootup dmesg into this bugzilla - say, up as far as the
Freeing unused kernel memory: XXXk freed
though it's mainly the BIOS-e820 map at the beginning I'm
thinking might turn out to be useful when framing hypotheses.

Comment 7 Alan Jenkins 2008-08-05 12:40:09 UTC
Sorry, but what I had on disk as "v2.6.26-rc9-615-gd86623a" wasn't.  I.e. it claimsit's a different version in dmesg.  I can't actually build v2.6.26-rc9-615-gd86623a.

Please use v2.6.26-rc9-547-ga939098..v2.6.26-rc9-00681-g1a98fd1.(v2.6.26-rc9-00681-g1a98fd1~1 and immediate predecessors are the ones that panic just after "Kernel really alive").
Comment 8 Alan Jenkins 2008-08-05 12:45:56 UTC
Created attachment 17096 [details]
2.6.26-rc9-00681-g1a98fd1 boot messages

It also seems I missed the very noisy CPA self-test failures in this kernel log!  Note this happens *before* I suspend.
Comment 9 Hugh Dickins 2008-08-08 10:13:24 UTC
I've spent a while on this but made little progress, sorry.

I've probably spent too long wondering about those CPA self-test failures,
which I now think irrelevant.  They all stem from the fact that the first
of the two pmds which cover the kernel's direct map of your 2GB didn't
have the global bit set in its entries (whereas the second did, to judge
by the 201 failures out of 400 tests), so clearing that bit never split
the level.  That could easily be a bug at that point of the bisection,
which got fixed later (certainly the test got changed later, not to use
the global bit): before we're finished I should include a patch to check
that the global bit is now being set with your latest kernel, if it is
then those CPA failures won't be worth spending more time on.

But I've still no ideas about the corruption you're seeing in the second
of those pmds.  I've gone through the diff, though I wouldn't pretend
with a fine-toothed comb, and nothing stood out as a good suspect.  It's
really too big a diff for me to grasp, shame about that eclipse in the
bisection ("eclipse" is what comes to my mind when I hit an impenetrable
area of a bisection like that): there may be ways to narrow it, but it's
tedious on one's own machine, and worse trying to direct someone else
at a distance.  And it's perfectly possible that changes here are just
shifting pre-existing or potential corruption to where it does visible
damage: the changes you've bisected to are not necessarily to blame.

I've not yet worked out a sensible next step.  Presumably you saw a
variety of oopses as you bisected down, it would be worth attaching
those to the bugzilla, in case there's something to be learnt from
their pattern or their spread.  But right now I'm stale: I'm going
to break off for a day or two.

Comment 10 Rafael J. Wysocki 2008-08-10 05:43:50 UTC
On Sunday, 10 of August 2008, Hugh Dickins wrote:
> On Sat, 9 Aug 2008, Theodore Tso wrote:
> > On Sun, Aug 10, 2008 at 12:43:51AM +0200, Rafael J. Wysocki wrote:
> > > This message has been generated automatically as a part of a report
> > > of recent regressions.
> > > 
> > > The following bug entry is on the current list of known regressions
> > > from 2.6.26.  Please verify if it still should be listed and let me know
> > > (either way).
> 
> Yes, it should still be listed.
> 
> > > 
> > > 
> > > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11237
> > > Subject		: [BUG] 2.6.27-rc1 in ext3_find_entry
> > > Submitter	: Alan Jenkins <alan-jenkins@tuffmail.co.uk>
> > > Date		: 2008-08-02 9:51 (8 days old)
> > > References	: http://marc.info/?l=linux-kernel&m=121767073424952&w=4
> > > Handled-By	: Hugh Dickins <hugh@veritas.com>
> > > 
> > 
> > You might want to change the description to include that it occurred
> > after a suspend/resume; Hugh suspects corrupted PMD entries as the
> > cause of the crash, and not necessarily anything in the ext3 code.  So
> > the title might be a bit misleading.  (At the same time, if turns out
> > that the suspend/resume was a red herring, and it looks more like a
> > real ext3 bug, please send a note to that effect; right now I'm not
> > paying attention to this bug.)
> 
> Right, there's no reason at all to suppose it's related to ext3,
> that just happened to be the first victim of the corruption on
> one occasion.  Carry on paying no attention to this bug, Ted.
> "corrupt PMD after resume" perhaps.

Comment 11 Hugh Dickins 2008-08-11 12:48:44 UTC
I suppose we could try for a very lazy way out by asking:
do you see similar symptoms with 2.6.27-rc2 or current -git?
Though I won't feel very satisfied if you answer "no".

Comment 12 Alan Jenkins 2008-08-13 04:58:27 UTC
Created attachment 17215 [details]
Kernel log 2.6.27-rc3 init=/bin/bash

Thanks for honesty.  No, it's still here.

Reproduced on 2.6.27-rc3 with init=/bin/bash
- wait 30s for CPA self test - success
s2ram --force --acpi_sleep=3
- wait 30s for CPA self test - BUG

So I guess you're right that the CPA failures before suspend on bisect/bad was not significant.
Comment 13 Hugh Dickins 2008-08-13 17:23:41 UTC
Thanks for trying.  Your "No" is not the "no" which would have
left me unsatisfied, it's a "yes" which says we can't be lazy.
What a satisfying pity ;)

It's not at all surprising that the CPA self-test was successful:
even if what caused it to fail before is still an issue, the test
is now flipping a different bit, which is less likely to go wrong.

I'm uncertain whether you're implying that the CPA test after
resume was involved in the BUG, or whether you're just saying
that in the seconds you were waiting the BUG happened to occur.
I assume the latter (but not ruling out a connection).

Do you see the bug if CONFIG_CPA_DEBUG is switched off?
(Don't wait very long to suspend after bootup, after a few
minutes the CPA testing has split all and exhausted itself.)

The x86 laptop I can suspend/resume when _32 is these days not
resuming when _64 (not a recent regression, I believe).  Seems to
be just a video issue, maybe I should try tweaking its s2ram args,
or try a more uptodate s2ram.  If I can get it resuming usefully
when _64, it may be worth my trying CONFIG_CPA_DEBUG=y on it too.

(Well, I have been using that config a lot in the last few days,
for /proc/meminfo fixes; but not together with resume.)

I was hoping (did ask) for you to attach some more log extracts:
whenever you decided a bisection was bad, or saw the BUG above,
wasn't there an oops recorded in /var/log/messages?  The more of
those I can see (within reason!), the better the (not good)
chance I can make sense of them.  Thanks.

Comment 14 Alan Jenkins 2008-08-14 00:57:50 UTC
The CPA test after resume *was* involved in the bug.   "do_pageattr_test" is on the calltrace.  The calltrace is at the end of the dmesg I posted.

When I was testing earlier & bisecting, I did not have CONFIG_CPA_DEBUG enabled.  Instead of just waiting, I had to trigger the bug by logging into X and running some programs.  I found "gitk" was a very reliable way to trigger it.

Sorry about the logs.  I misunderstood and thought you were talking about the problems within the bisection "eclipse".  (Most of that was build errors and panics on early boot which leave no calltrace).

I didn't keep a persistent log of the BUGs myself.  I'll see how easy it is to dig them out from /var/log/messages.  There were certainly a variety of different ones and I can see how that could be useful.

When reproducing this from an X login, I ended up with a stream of BUGs which soon left the computer unusable.  I'll attach the log from my original report which shows as much as I could capture of one of these sequences.  If I can get a population out of /var/log/messages then I'll limit them to the first three BUGs per sequence.
Comment 15 Alan Jenkins 2008-08-14 01:00:27 UTC
Created attachment 17228 [details]
Original BUG sequence (first one in ext3)

Here's the sequence of BUGs from my initial report.
Comment 16 Alan Jenkins 2008-08-14 01:15:31 UTC
Created attachment 17232 [details]
BUG calltrace 1
Comment 17 Alan Jenkins 2008-08-14 01:15:50 UTC
Created attachment 17233 [details]
Bug calltrace 2
Comment 18 Alan Jenkins 2008-08-14 01:16:05 UTC
Created attachment 17234 [details]
Bug calltrace 3
Comment 19 Alan Jenkins 2008-08-14 01:16:20 UTC
Created attachment 17235 [details]
Bug calltrace 4
Comment 20 Alan Jenkins 2008-08-14 01:16:34 UTC
Created attachment 17236 [details]
Bug calltrace 5
Comment 21 Alan Jenkins 2008-08-14 01:23:40 UTC
That's all for now, I hope it helps our not good chances :-).

I selected a spread of different processes, and they include another trace in ext3.  I have lots more calltraces with git specifically (after I switched to reproducing with gitk), but they all happen in copy_page_c and I didn't notice any interesting differences.
Comment 22 Hugh Dickins 2008-08-14 07:56:28 UTC
On Thu, 14 Aug 2008, bugme-daemon@bugzilla.kernel.org wrote:
> ------- Comment #14 from alan-jenkins@tuffmail.co.uk  2008-08-14 00:57 -------
> The CPA test after resume *was* involved in the bug.   "do_pageattr_test"
> is on the calltrace.  The calltrace is at the end of the dmesg I posted.

Sorry, I was blind and now you've helped me see.  Something about the
bugzilla mail format, I guess the http link always there at the top,
led me to miss the attachment completely.  And thanks for all the
others you've sent: now all I need is time...

Comment 23 Alan Jenkins 2008-08-17 04:27:22 UTC
I've chipped away at this to reduce the suspect changes.  What stands out now is the setup unification.  I can reproduce the bug after the sequence below.  I hope this helps in reducing the diff we have to look at.

# Get badness
git-checkout v2.6.26-rc9-00681-g1a98fd1

# Revert not-bad changes
git-revert d52d53b8a5b258bfaab9223a5e7284fcfdd48577
git-revert d8d5900ef8afc562088f8470feeaf17c4747790f

# docs, 32bit and mach are not relevant
git-checkout v2.6.26-rc9-547-ga939098 Documentation 
git-checkout v2.6.26-rc9-547-ga939098 include/asm-x86/mach-*
git-checkout v2.6.26-rc9-547-ga939098 include/asm-x86/*_32.*
git-checkout v2.6.26-rc9-547-ga939098 arch/x86/*/*_32.*
git-checkout v2.6.26-rc9-547-ga939098 arch/x86/ia32
git-rm arch/x86/kernel/probe_roms_32.c

# Just need 2-line fix to build e820; drop the other changes
git-checkout v2.6.26-rc9-547-ga939098 arch/x86/kernel/e820.c
sed -i "s/\([^_a-zA-Z]\)end_pfn/\1max_pfn/g" arch/x86/kernel/e820.c

# Can also drop all these changed files
xargs git-checkout v2.6.26-rc9-547-ga939098 << EOF
 arch/x86/kernel/acpi/sleep.c
 arch/x86/kernel/apic_64.c
 arch/x86/kernel/asm-offsets_64.c
 arch/x86/kernel/cpu/amd_64.c
 arch/x86/kernel/cpu/common.c
 arch/x86/kernel/cpu/perfctr-watchdog.c
 arch/x86/kernel/efi.c
 arch/x86/kernel/head.c
 arch/x86/kernel/head64.c
 arch/x86/kernel/io_apic_64.c
 arch/x86/kernel/machine_kexec_64.c
 arch/x86/kernel/nmi.c
 arch/x86/kernel/paravirt.c
 arch/x86/kernel/paravirt_patch_64.c
 arch/x86/kernel/probe_roms_32.c
 arch/x86/kernel/process_64.c
 arch/x86/kernel/smpboot.c
 arch/x86/kernel/vsyscall_64.c
 arch/x86/mm/fault.c
 arch/x86/mm/numa_64.c
 arch/x86/mm/pgtable.c
 arch/x86/power/hibernate_64.c
 arch/x86/xen/enlighten.c
 arch/x86/xen/setup.c
 fs/Kconfig
 include/asm-x86/apic.h
 include/asm-x86/cmpxchg_64.h
 include/asm-x86/e820.h
 include/asm-x86/elf.h
 include/asm-x86/hw_irq.h
 include/asm-x86/io_apic.h
 include/asm-x86/msr.h
 include/asm-x86/nmi.h
 include/asm-x86/numa_64.h
 include/asm-x86/paravirt.h
 include/asm-x86/pgalloc.h
 include/asm-x86/required-features.h
 mm/page_alloc.c
EOF
Comment 24 Rafael J. Wysocki 2008-08-17 05:14:09 UTC
On Sunday, 17 of August 2008, Hugh Dickins wrote:
> On Sat, 16 Aug 2008, Rafael J. Wysocki wrote:
> > 
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11237
> > Subject		: corrupt PMD after resume
> > Submitter	: Alan Jenkins <alan-jenkins@tuffmail.co.uk>
> > Date		: 2008-08-02 9:51 (15 days old)
> > References	: http://marc.info/?l=linux-kernel&m=121767073424952&w=4
> > Handled-By	: Hugh Dickins <hugh@veritas.com>
> 
> Definitely should still be listed: Alan has verified it still happens
> with -rc3.  I keep on going back to look at the info he's sent, to
> try and work out what might be happening and what to try next.

Comment 25 Alan Jenkins 2008-08-20 02:19:04 UTC
I found a patch for the early boot problems (http://lists-archives.org/linux-kernel/16541925-next-0704-x86_64-panics-on-booting.html), which allowed me to isolate the bug to a specific commit, 4f9c11dd49fb73e1ec088b27ed6539681a445988.

In other words, this bug appears to be a duplicate of Bug #11313.
Comment 26 Rafael J. Wysocki 2008-08-20 04:00:43 UTC
Thanks for identifying the offending commit:

commit 4f9c11dd49fb73e1ec088b27ed6539681a445988
Author: Jeremy Fitzhardinge <jeremy@goop.org>
Date:   Wed Jun 25 00:19:19 2008 -0400

    x86, 64-bit: adjust mapping of physical pagetables to work with Xen

    Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
    Cc: xen-devel <xen-devel@lists.xensource.com>
    Cc: Stephen Tweedie <sct@redhat.com>
    Cc: Eduardo Habkost <ehabkost@redhat.com>
    Cc: Mark McLoughlin <markmc@redhat.com>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>

but I'm not sure if that's a duplicate.  IMO it's yet another regression caused by the same patch.
Comment 27 Hugh Dickins 2008-08-20 05:27:07 UTC
Great work, Alan, you're a hero.

I agree with you that it looks like a duplicate of Bug #11313
(not a bug I'd paid any attention to until you mentioned it here):
again bad PMDs, again often hitting in clear_page_c or copy_page_c.

But we needn't debate that: the thing is to find a fix to either
and then check if it fixes the other.

I was looking at init_memory_mapping() and what it calls last week
(for the much more trivial business of /proc/meminfo DirectMap lines)
so I'll have another look around there now - but probably with the
same success I've had so far :(

Ah, good, jeremy@goop.org is now on the Cc list for this one.

Comment 28 Jeremy Fitzhardinge 2008-08-20 08:57:24 UTC
Hm, yes, I wouldn't say it's obviously a duplicate.  However:
 - the change in question works fine on many, many machines
 - in this bug, the failure is after a suspend/resume cycle
 - in Bug #11313 the failure is when plugging the HDMI connector

In other words, both actions which involve the BIOS.  I suspect some bad interaction with the firmware, but I don't have any good theories.

Could you try reverting just a6523748bddd38bcec11431f57502090b6014a96, while leaving 4f9c... applied?  Could you also apply the attached patch, and post the boot-time dmesg output?

What's the hardware?  How much memory does it have?  Is there a firmware update available?

In both cases, the oops is in ext3, but that could just be because it does most of the memory touching.  In this case it's simply a not-present pmd, but in #11313 it's corrupted.  But that could just be a coincidence depending on whether the P bit is set in the random corruption.
Comment 29 Jeremy Fitzhardinge 2008-08-20 08:58:41 UTC
Created attachment 17333 [details]
Print info on initial kernel mappings
Comment 30 Alan Jenkins 2008-08-20 09:03:13 UTC
(In reply to comment #28)
> In both cases, the oops is in ext3, but that could just be because it does most
> of the memory touching.  In this case it's simply a not-present pmd, but in
> #11313 it's corrupted.  But that could just be a coincidence depending on
> whether the P bit is set in the random corruption.
> 

ext3 is a red herring - sometimes it shows up in the traces, more often it doesn't.  As Hugh says, it's mainly clear_page_c, copy_page_c, also memset_c.
Comment 31 Alan Jenkins 2008-08-20 09:19:23 UTC
(In reply to comment #28)
> Hm, yes, I wouldn't say it's obviously a duplicate.  However:
>  - the change in question works fine on many, many machines
>  - in this bug, the failure is after a suspend/resume cycle
>  - in Bug #11313 the failure is when plugging the HDMI connector
> 
> In other words, both actions which involve the BIOS.  I suspect some bad
> interaction with the firmware, but I don't have any good theories.
> 
> Could you try reverting just a6523748bddd38bcec11431f57502090b6014a96, while
> leaving 4f9c... applied?  Could you also apply the attached patch, and post the
> boot-time dmesg output?
> 
> What's the hardware?  How much memory does it have?

Desktop, Intel Core 2 Duo 6420.  Intel 965-something chipset. 2G ram.

> Is there a firmware update
> available?

Not sure, but I'd be unlikely to try it if there was one.  dmidecode says the BIOS is "Phoenix Technologies 6.00 PG", released 07/26/2007.
Comment 32 Alan Jenkins 2008-08-20 09:33:20 UTC
Created attachment 17334 [details]
dmesg with patch applied

info on initial kernel mappings
Comment 33 Alan Jenkins 2008-08-20 09:54:55 UTC
(In reply to comment #28)
> Could you try reverting just a6523748bddd38bcec11431f57502090b6014a96, while
> leaving 4f9c... applied?

Done - the paging error still happens.
Comment 34 Jeremy Fitzhardinge 2008-08-20 10:06:31 UTC
(In reply to comment #33)
> Done - the paging error still happens.
> 

Well, consistent with #11313 at least.
Comment 35 H. Peter Anvin 2008-08-20 10:21:53 UTC
Could we get a dump of the full kernel page tables?

To do so:

1. make sure the kernel is configured with CONFIG_X86_PTDUMP.
2. make sure debugfs is mounted
   (mount -t debugfs none /sys/kernel/debug)
3. cat /sys/kernel/debug/kernel_page_tables > kernel_page_tables.txt
Comment 36 Alan Jenkins 2008-08-21 02:06:26 UTC
Created attachment 17352 [details]
Page tables from 2.6.27-rc3
Comment 37 Jeremy Fitzhardinge 2008-08-21 11:10:46 UTC
[    0.000000] kernel direct mapping tables up to 7f6e0000 @ 8000-c000
                                                             ^^^^^^^^^
[    0.000000]   #5 [0000008000 - 000000a000]          PGTABLE ==> [0000008000 - 000000a000]
                                                                    ^^^^^^^^^^     ^^^^^^^^^^^
Comment 38 Jeremy Fitzhardinge 2008-08-21 11:11:22 UTC
Duplicate of Bug #11313
Comment 39 Jeremy Fitzhardinge 2008-08-21 11:11:41 UTC
(or vice versa, given the ordering...)
Comment 40 Rafael J. Wysocki 2008-08-21 11:27:42 UTC
*** Bug 11313 has been marked as a duplicate of this bug. ***
Comment 41 Jeremy Fitzhardinge 2008-08-21 11:43:58 UTC
Could you post the full dmesg output for a boot from before 4f9c11dd49fb73e1ec088b27ed6539681a445988?
Comment 42 Alan Jenkins 2008-08-21 12:25:48 UTC
Created attachment 17363 [details]
2.6.26 dmesg

Here's dmesg from 2.6.26.  I could do 4f9c11dd49fb73e1ec088b27ed6539681a445988~1 tomorrow, if that would be more useful.
Comment 43 Jeremy Fitzhardinge 2008-08-21 12:28:00 UTC
4f9c11dd49fb73e1ec088b27ed6539681a445988~1 would be useful, but the 2.6.26 is an interesting base for comparison.
Comment 44 Alan Jenkins 2008-08-22 01:36:19 UTC
Created attachment 17368 [details]
"last known good" dmesg

Ok, this is what I get from 4f9c11dd49fb73e1ec088b27ed6539681a445988~1, after applying ingo's boot fix as above (http://lists-archives.org/linux-kernel/16541925-next-0704-x86_64-panics-on-booting.html).
Comment 45 Jeremy Fitzhardinge 2008-08-22 15:38:15 UTC
I'm losing track a bit here.  Do you have a dmesg of exactly 4f9c11dd49fb73e1ec088b27ed6539681a445988 (+fix, if needed) as well?  I'd like to get a minimal difference of good vs bad.

At the moment, comparing the 4f9c11dd49fb73e1ec088b27ed6539681a445988~1 vs "2.6.26-rc9-00681-g1a98fd1 boot messages" shows this difference:

-[    0.000000]   early res: 5 [8000-afff] PGTABLE
+[    0.000000]   #5 [ 0000008000 - 0000009000 ]          PGTABLE ===> [ 0000008000 - 0000009000 ]

(afff vs 9000 (=8ffff)) but it's unclear to me whether that's a real problem or just a difference in how the mapping is performed.
Comment 46 Jeremy Fitzhardinge 2008-08-22 15:39:10 UTC
er, 9000 == 8fff
Comment 47 Alan Jenkins 2008-08-23 01:09:27 UTC
Created attachment 17381 [details]
dmesg from "bad" kernel

Good point.

Here's 4f9c11dd49fb73e1ec088b27ed6539681a445988 + boot fix.
Comment 48 Rafael J. Wysocki 2008-08-23 12:18:02 UTC
Handled-By : Jeremy Fitzhardinge <jeremy@goop.org>
Comment 49 Jeremy Fitzhardinge 2008-08-23 16:41:45 UTC
OK, this shows:
-[    0.000000]   early res: 5 [8000-afff] PGTABLE
+[    0.000000]   early res: 5 [8000-8fff] PGTABLE

But this is expected, because the patch is reusing the boot-time pagetables rather than allocating new ones.  There are no other significant differences, which is also expected.

Now, the question is whether this is leading to the reported bug?  I'll put together a patch to revert the reuse to see if that makes the problem go away.  

Also, does booting with mem=1G or other values change the way it crashes?
Comment 50 Rafał Miłecki 2008-08-24 08:01:26 UTC
I can not test 4f9c11dd49fb... with mem=X option as ACPI doesn't work in system booted this way (http://bugzilla.kernel.org/show_bug.cgi?id=11313#c40).

However booting 2.6.27-rc4 with mem=1G or mem=2G causes STABLE system after playing with HDMI port! I can plug in HDMI cable, plug it out and start X (init 5)!

Alan: how does mem=X work in your case?
Comment 51 Hugh Dickins 2008-08-25 02:51:28 UTC
My suspicion (but sorry, I'm about to go out, and haven't written a
patch for you to verify this) is that Jeremy's patch gets implicated,
not because of any error in that patch, but because it changes which
physical pages are used for which page tables, moving corruption from
somewhere it would never get noticed (unless perhaps the kernel tried
accessing random addresses) to somewhere it is now sure to be noticed.

That "start = 0x8000;" in arch/x86/mm/init_64.c's find_early_table_space:
I wonder what the history of that particular offset is, and whether it
should be something else in the case of your machines (either for good
reason, or for hacky avoid-bug-in-BIOS reason).

Easy enough for you to patch that to "start = 0x10000;" say; but we
also need to memset page 8 (in Alan's case) or page 11 (in Rafal's)
and check for corruption thereafter, to see if there's any truth in
my suspicion.

Comment 52 Hugh Dickins 2008-08-25 04:19:54 UTC
Here's such a patch as I had in mind, which boots okay but stays silent
for me.  I've no special reason for placing the check in do_page_fault,
just a nearby sourcefile which will check often.  Against 2.6.27-rc4.

--- 2.6.27-rc4/arch/x86/mm/fault.c	2008-07-29 04:24:15.000000000 +0100
+++ linux/arch/x86/mm/fault.c	2008-08-25 12:11:59.000000000 +0100
@@ -680,6 +680,17 @@ void __kprobes do_page_fault(struct pt_r
 		error_code |= PF_USER;
 again:
 #endif
+	{
+		unsigned long *addr = (unsigned long *)__va(0x8000);
+		while (addr < (unsigned long *)__va(0xc000)) {
+			if (*addr) {
+				printk("%p: %lx\n", addr, *addr);
+				*addr = 0;
+			}
+			addr++;
+		}
+	}
+
 	/* When running in the kernel we expect faults to occur only to
 	 * addresses in user space.  All other faults represent errors in the
 	 * kernel and should generate an OOPS.  Unfortunately, in the case of an
--- 2.6.27-rc4/arch/x86/mm/init_64.c	2008-08-21 05:52:51.000000000 +0100
+++ linux/arch/x86/mm/init_64.c	2008-08-25 12:11:59.000000000 +0100
@@ -468,6 +468,12 @@ static void __init find_early_table_spac
 	 * need roughly 0.5KB per GB.
 	 */
 	start = 0x8000;
+	while (start < 0xc000) {
+		void *adr = early_ioremap(start, PAGE_SIZE);
+		memset(adr, 0, PAGE_SIZE);
+		early_iounmap(adr, PAGE_SIZE);
+		start += PAGE_SIZE;
+	}
 	table_start = find_e820_area(start, end, tables, PAGE_SIZE);
 	if (table_start == -1UL)
 		panic("Cannot find space for the kernel page tables");

Comment 53 Jeremy Fitzhardinge 2008-08-25 10:38:11 UTC
(In reply to comment #51)
> My suspicion (but sorry, I'm about to go out, and haven't written a
> patch for you to verify this) is that Jeremy's patch gets implicated,
> not because of any error in that patch, but because it changes which
> physical pages are used for which page tables, moving corruption from
> somewhere it would never get noticed (unless perhaps the kernel tried
> accessing random addresses) to somewhere it is now sure to be noticed.

Yes, that's my suspicion as well.  We would have seen many more bug reports if there had been something inherently wrong with this patch.

> Easy enough for you to patch that to "start = 0x10000;" say; but we
> also need to memset page 8 (in Alan's case) or page 11 (in Rafal's)
> and check for corruption thereafter, to see if there's any truth in
> my suspicion.

The real difference my change makes is that it continues to use init_level4_pgt, level3_ident_pgt, level3_kernel_pgt, level2_fixmap_pgt, level1_fixmap_pgt, level2_ident_pgt, level2_kernel_pgt and level2_spare_pgt in head_64.S.  These are all early in the .text segment, nestled among code which is never used again after boot.

The same 0x8000-0x[ac]000 space is used either way, so I'm not sure that scanning that region for corruption will really help much.

Comment 54 H. Peter Anvin 2008-08-25 10:58:19 UTC
It's certainly not hard to imagine the first 64K being clobbered by firmware during resume.  There might be a good idea to add a kernel boot-time configuration option to reserve low memory beyond the first 4K (which we always reserve.)
Comment 55 Jeremy Fitzhardinge 2008-08-25 11:32:23 UTC
In the oops in "Kernel log 2.6.27-rc3 init=/bin/bash" the pgd entry is "0", and cr3=0000000000201000 - which is init_level4_pgt.  That's always used as init_mm.pgd, and its use is unchanged by my patch.
Comment 56 Jeremy Fitzhardinge 2008-08-25 11:34:52 UTC
(In reply to comment #54)
> It's certainly not hard to imagine the first 64K being clobbered by firmware
> during resume.  There might be a good idea to add a kernel boot-time
> configuration option to reserve low memory beyond the first 4K (which we always
> reserve.)

Yes, but 4f9c11dd49fb73e1ec088b27ed6539681a445988 doesn't affect whether memory at 0x8000 is used or not, just how much memory is being used.  And given that the oops is showing that a pmd in 0x8000 has been corrupted, it's really unclear what's going on.  One possibility is that memory was always being corrupted there, and its only now significant.  Maybe Hugh's patch will point something out...
Comment 57 Hugh Dickins 2008-08-25 18:48:24 UTC
Yes, the page at 0x8000 is used in all these cases (without my hack-patch).
My point is that, depending on whereabouts it's used in the pgd/pud/pmd
hierarchy, a corrupted area can easily get to be used for a part of the
vast address space that never actually corresponds to addresses that are
in use; but with the different usage of pages in 4f9c11dd, the corrupted
area gets shifted into a significant position, in the direct 1:1 map.

(I do have some doubts about your reuse of the head_64.S pagetables,
in particular the way level2_ident_pgt omits the NX bit but the direct
map usually sets it.  But that's a very different issue, which I'd
rather come back to after this corruption question is sorted out.)

Comment 58 Alan Jenkins 2008-08-26 01:43:45 UTC
mem=1G works for me.

Hugh's patch also fixes it.  Here's the output:

[  116.546675] PM: Finishing wakeup.
[  116.546675] Restarting tasks ... ffff8800000083e8: 803c85370cfc0000
[  116.553658] ffff8800000083f0: 3000
[  116.564012] done.
Comment 59 Hugh Dickins 2008-08-26 06:00:26 UTC
Thanks, Alan.  Right, this fills me with shame, it tells us what
we knew three weeks ago, that suspend+resume corrupts the long at
0xffff8800000083ec to 0x3000803c85370cfc
(But Rafal's case not identical.)

And the message comes after "Restarting tasks" merely because of
my lazy (well, rushed) positioning of the check in do_page_fault.

Please let's now try what I originally intended way back then:
revert that patch and apply this patch below, which keeps the same
check through four pages (though in your case we now know that only
one long is corrupted to non-0), but does it during suspend+resume.

I've added the prefix "Corrupted" to make the messages in question
easier to find: please post those messages along with their context
of surrounding messages.  I'm no expert on hardware initialization
or BIOS, but if it's just a BIOS issue then I'd expect the messages
to appear just before the first "EARLY resume" message, whereas if
a driver or device issue then just after its (perhaps LATE) suspend
or (perhaps EARLY) resume message.

--- 2.6.27-rc4/arch/x86/mm/init_64.c	2008-08-21 05:52:51.000000000 +0100
+++ linux/arch/x86/mm/init_64.c	2008-08-25 12:11:59.000000000 +0100
@@ -468,6 +468,12 @@ static void __init find_early_table_spac
 	 * need roughly 0.5KB per GB.
 	 */
 	start = 0x8000;
+	while (start < 0xc000) {
+		void *adr = early_ioremap(start, PAGE_SIZE);
+		memset(adr, 0, PAGE_SIZE);
+		early_iounmap(adr, PAGE_SIZE);
+		start += PAGE_SIZE;
+	}
 	table_start = find_e820_area(start, end, tables, PAGE_SIZE);
 	if (table_start == -1UL)
 		panic("Cannot find space for the kernel page tables");
--- 2.6.27-rc4/drivers/base/power/main.c	2008-07-29 04:21:46.000000000 +0100
+++ linux/drivers/base/power/main.c	2008-08-26 13:20:00.000000000 +0100
@@ -17,6 +17,7 @@
  * subsystem list maintains.
  */
 
+#define DEBUG
 #include <linux/device.h>
 #include <linux/kallsyms.h>
 #include <linux/mutex.h>
@@ -263,6 +264,14 @@ static char *pm_verb(int event)
 
 static void pm_dev_dbg(struct device *dev, pm_message_t state, char *info)
 {
+	unsigned long *addr = (unsigned long *)__va(0x8000);
+	while (addr < (unsigned long *)__va(0xc000)) {
+		if (*addr) {
+			printk("Corrupted %p: %lx\n", addr, *addr);
+			*addr = 0;
+		}
+		addr++;
+	}
 	dev_dbg(dev, "%s%s%s\n", info, pm_verb(state.event),
 		((state.event & PM_EVENT_SLEEP) && device_may_wakeup(dev)) ?
 		", may wakeup" : "");

Comment 60 Ingo Molnar 2008-08-26 06:31:11 UTC
> Please let's now try what I originally intended way back then: revert 
> that patch and apply this patch below, which keeps the same check 
> through four pages (though in your case we now know that only one long 
> is corrupted to non-0), but does it during suspend+resume.

btw., if you know the exact corruption pattern you might want to utilize 
ftrace's function tracing callback to check for the corruption in a 
brute-force way, by essentially doing the check for every kernel 
function that gets called, and to generate a one-time stackdump of the 
incident. (save_stack_trace() can be used to display that dump later on 
- often the thing that triggers such corruption cannot do a printk)

	Ingo

Comment 61 Alan Jenkins 2008-08-26 07:59:01 UTC
Well, I did discourage you by saying "scary", not asking for a patch, and then doing my own thing instead.  (I thought your suggestion was too brittle because the PMD fault numbers varied; it was non-obvious that the corruption was always in the same place).

Blame the BIOS.

[  101.456977] agpgart-intel 0000:00:00.0: LATE suspend
[  101.456977] Back to C!
[  101.456977] Corrupted ffff8800000083e8: 803c85370cfc0000
[  101.456977] Corrupted ffff8800000083f0: 3000
[  101.456977] agpgart-intel 0000:00:00.0: EARLY resume

So from what hpa says my hardware will need a special boot option to avoid suffering from this bug in future?  Though I could just use the existing memmap option ("memmap=64K$0", I think) to mark the first 64K as reserved.
Comment 62 Jeremy Fitzhardinge 2008-08-26 08:18:12 UTC
Hm, a tradeoff.  Should we burn 64k everywhere to deal with a couple of machines, when there's a kernel parameter which will fix it?  The trouble is knowing when you need to use the parameter.

It would be interesting to see if Rafał can reproduce this corruption - then we can generalize something about it.
Comment 63 H. Peter Anvin 2008-08-26 09:38:48 UTC
You're right, the memmap= option should do that.

Keep in mind, too, how little 64K is in the modern world.  An x86 machine with 128 MB is considered anemic today, and that would amount to losing 0.05% of its memory.

However, before considering whether or not to do that generically, I'd like a confirmation that "memmap=64K$0" works at all.
Comment 64 Hugh Dickins 2008-08-26 09:46:07 UTC
Blame the BIOS, yes, it looks like that, and I don't suppose we'll
find out any more about it.  Ingo's ftrace suggestion was helpful,
hadn't crossed my mind, but I don't think it would tell us more.

And using the existing memmap= boot parameter is good thinking, yes,
I'd rather we use what's already available than add something extra.
Even if the first 64kB is theoretically more vulnerable than the rest,
we haven't noticed that in the past, and it seems just a freak effect
of Jeremy's patch that we notice it now in your case.  (There may be
others already using memmap= to avoid such corruption, who could now
stop doing so because of Jeremy's rearrangement.)

" memmap=4k!32k" should be enough for you, Alan.  And using memmap=
will be safer than my patch - poking around afterwards, I notice that
the way I coded it, page 8 wasn't used in the pagetables, but was freed
for use by the system afterwards, so left you liable to worse random
corruption (though I think the page allocator tends to leave those
lowest pages free - hmm, if so, might that be why we haven't noticed
such corruptions in the past??).

And " memmap=4k!44k" is likely to be good for Rafał, though I would
like to hear first what his HDMI case shows with my first patch (the
one with the check in do_page_fault, since suspend+resume is not
relevant to his case).

(By the way, back in #9, I said that I wanted to check that your
global flag is being properly set, since its absence was causing
CPA selftest noise during your bisection.  That was resolved by
the -rc3 pagetables you showed in #36, which show GLB throughout
the Low Kernel Mapping: so we need not worry any more about that.)
Comment 65 Alan Jenkins 2008-08-26 09:52:01 UTC
(In reply to comment #63)
> You're right, the memmap= option should do that.
> 
> Keep in mind, too, how little 64K is in the modern world.  An x86 machine with
> 128 MB is considered anemic today, and that would amount to losing 0.05% of its
> memory.
> 
> However, before considering whether or not to do that generically, I'd like a
> confirmation that "memmap=64K$0" works at all.
> 

Yes, it does work.
Comment 66 Jeremy Fitzhardinge 2008-08-26 10:06:13 UTC
(In reply to comment #64)
> Blame the BIOS, yes, it looks like that, and I don't suppose we'll
> find out any more about it.  Ingo's ftrace suggestion was helpful,
> hadn't crossed my mind, but I don't think it would tell us more.
> 
> And using the existing memmap= boot parameter is good thinking, yes,
> I'd rather we use what's already available than add something extra.
> Even if the first 64kB is theoretically more vulnerable than the rest,
> we haven't noticed that in the past, and it seems just a freak effect
> of Jeremy's patch that we notice it now in your case.  (There may be
> others already using memmap= to avoid such corruption, who could now
> stop doing so because of Jeremy's rearrangement.)
> 
> " memmap=4k!32k" should be enough for you, Alan.  And using memmap=
> will be safer than my patch - poking around afterwards, I notice that
> the way I coded it, page 8 wasn't used in the pagetables, but was freed
> for use by the system afterwards, so left you liable to worse random
> corruption (though I think the page allocator tends to leave those
> lowest pages free - hmm, if so, might that be why we haven't noticed
> such corruptions in the past??).

Quite possible they were happening without noticeable effect.  Those pages were still being used for pieces of pagetable, but they may have corresponded to some of the vast no-mans land of unused address space.

> And " memmap=4k!44k" is likely to be good for Rafał, though I would
> like to hear first what his HDMI case shows with my first patch (the
> one with the check in do_page_fault, since suspend+resume is not
> relevant to his case).

Yes.  I would really like to confirm that these are actually duplicate bugs.  And I wonder how many other machines have similar problems?  Maybe banning the first 64k really is the right answer.
 
> (By the way, back in #9, I said that I wanted to check that your
> global flag is being properly set, since its absence was causing
> CPA selftest noise during your bisection.  That was resolved by
> the -rc3 pagetables you showed in #36, which show GLB throughout
> the Low Kernel Mapping: so we need not worry any more about that.)

Yes, I fixed the inconsistency of __KERNEL_X vs KERNEL_X, where the former doesn't have _PAGE_GLOBAL set (meaning that the static head_64.S pagetables didn't have it).  Also, I changed the CPA test to use another page flag anyway, since _PAGE_GLOBAL isn't necessarily present on all kernel mappings anyway (old 32-bit and paravirtual 64-bit).
Comment 67 Rafał Miłecki 2008-08-26 12:48:10 UTC
I installed 32-bit version of openSUSE 11.0 to verify bug in iwlagn (http://sourceforge.net/mailarchive/forum.php?thread_name=b170af450808120331j50a2e8a6o55b3412d8d24bbfa%40mail.gmail.com&forum_name=ipw3945-devel).

What's interesting is that I can not reproduce this bug using openSUSE 11.0 32-bit with self-compiled 2.6.27-rc4 using the same configuration as earlier.

I'll reinstall openSUSE again to 64-bit version and try theses patches.
Comment 68 Alan Jenkins 2008-08-26 13:09:04 UTC
You could save yourself some effort and just build a 64 bit kernel.  It will happily run a 32-bit userspace.

I guess 32-bit page tables are laid out differently, and the corruption misses the vital parts.
Comment 69 Jeremy Fitzhardinge 2008-08-26 13:17:41 UTC
Or the BIOS bug only manifests when the kernel is running in 64-bit mode.  Not unlikely, given the rarity of 64-bit Windows.
Comment 70 H. Peter Anvin 2008-08-26 13:19:52 UTC
Well, the page tables are definitely laid out differently in 32- and 64-bit mode.  It is, in fact, one of the biggest differences between 32- and 64-bit mode, and rather inherently so.  So no surprise there.
Comment 71 Yinghai Lu 2008-08-26 13:28:14 UTC
Rafal, can you post dmesg for your 64bit kernel and 32bit kernel?
32bit is supposed to use ram before 1M for pgtable too from 2.6.27-rc1

Comment 72 Jeremy Fitzhardinge 2008-08-26 13:39:55 UTC
(In reply to comment #70)
> Well, the page tables are definitely laid out differently in 32- and 64-bit
> mode.  It is, in fact, one of the biggest differences between 32- and 64-bit
> mode, and rather inherently so.  So no surprise there.

They're not that different; I think 0x8000 can still be allocated for pagetable use in 32-bit, and with PAE the entries are even the same format.  That said, I think we can fully unify init_mm pagetable construction now (in principle, barring bugs like this).
Comment 73 H. Peter Anvin 2008-08-26 13:42:07 UTC
Well, yes; however, right now they're constructed quite differently, which certainly explains the difference in behaviour.
Comment 74 Rafał Miłecki 2008-08-27 07:26:21 UTC
OK, I tried a few posted patches

1) Clean 2.6.27-rc4 + patch from comment #51 (start = 0x10000)
Works fine, system doesn't crash (and keeps stable) after using HDMI "init 5".

2) Clean 2.6.27-rc4 + patch from comment #52 + prefix "fault.c" in printk
Works fine, system stable and doesn't crash. After plugging HDMI I get this in dmesg:
fault.c: ffff88000000be98: b02a000400000000

3) Clean 2.6.27-rc4 + patch from comment #59
Works fine again, "Corrupted" msg doesn't appear in dmesg after using HDMI port.

I will try memmap later.
Comment 75 Rafał Miłecki 2008-08-28 04:28:06 UTC
I was trying memmap options on clean 2.6.27-rc4.

1) memmap=64K$0
Works fine, HDMI doesn't crash my OS

2) memmap=4k!44k
OS doesn't boot:
Kernel alive
Kernel really alive
PANIC: early exception 0e rip 10:ffffffff8021fd37 error 0 cr2 ffffffffff5fc0f0
Comment 76 Alan Jenkins 2008-08-28 06:05:44 UTC
kernel-parameters.txt doesn't say anything about "!" character in memmap=.  I think Hugh meant "memmap=4k$44k".
Comment 77 Hugh Dickins 2008-08-28 08:40:44 UTC
Aagh, sorry for wasting your time again, indeed I meant "$" not "!"

And regarding your results from the patches (in #74): yes, results
are as expected (the last staying silent because it would only notice
during suspend+resume, not during HDMI plugging).  But just as in
Alan's case, they don't actually tell us anything new, just confirm
what already appeared to be the case; and I doubt we shall learn any
more about it.

I'll leave it to Jeremy, Ingo and hpa to weigh up whether these two
cases now justify reserving the first 64kB on x86_64: I'm unsure.

Comment 78 Ingo Molnar 2008-08-28 08:54:16 UTC
> I'll leave it to Jeremy, Ingo and hpa to weigh up whether these two 
> cases now justify reserving the first 64kB on x86_64: I'm unsure.

definitely - could you please send a patch for it?

I'd suggest to make the initial version DMI quirk driven instead of 
generic - in the hope of this being a one-off (or twice-off) anomaly. 

Could everyone who is affected by this please attach a dmidecode output.

If it turns out to be more common than suspected, we could still make it 
unconditional in the future.

	Ingo

Comment 79 Alan Jenkins 2008-08-28 09:00:08 UTC
Created attachment 17508 [details]
dmidecode on faulty system

Hah.  Well, here's my DMI info.  Problem is it's useless, at least if you're limited to the same IDs as s2ram uses.

sudo s2ram -i
This machine can be identified by:
    sys_vendor   = "OEM"
    sys_product  = "OEM"
    sys_version  = "OEM"
    bios_version = "6.00 PG"
Comment 80 Ingo Molnar 2008-08-28 09:05:04 UTC
> Hah.  Well, here's my DMI info.  Problem is it's useless, at least if you're
> limited to the same IDs as s2ram uses.
> 
> sudo s2ram -i
> This machine can be identified by:
>     sys_vendor   = "OEM"
>     sys_product  = "OEM"
>     sys_version  = "OEM"
>     bios_version = "6.00 PG"

bah ...

is there any indication about which exact area the BIOS really needs? 
Maybe it's the first 8K instead of the first 4K? Wasting +4K of RAM is 
not a big deal. Wasting 60K (on all systems, all around the globe) we 
should try to avoid.

Or is there perhaps an indication somewhere about which area to protect? 
Does the EBDA show it perhaps?

	Ingo

Comment 81 Jeremy Fitzhardinge 2008-08-28 09:26:44 UTC
I think we should unconditionally reserve the first 64k, and add a debug option to do a corruption scan along the lines of Hugh's patch.  That way we can get a sense of how common this kind of lowmem corruption is.
Comment 82 Ingo Molnar 2008-08-28 09:30:59 UTC
> I think we should unconditionally reserve the first 64k, and add a 
> debug option to do a corruption scan along the lines of Hugh's patch.  
> That way we can get a sense of how common this kind of lowmem 
> corruption is.

ok. Please send a patch - if it's unintrusive enough we might still be 
able to get it into 2.6.27.

	Ingo

Comment 83 Rafał Miłecki 2008-08-28 12:54:23 UTC
Just for making diagnose 100% sure: memmap=4k$44k works fine.
Comment 84 Rafał Miłecki 2008-08-28 12:55:36 UTC
Created attachment 17510 [details]
dmidecode
Comment 85 Jeremy Fitzhardinge 2008-08-28 12:59:12 UTC
Created attachment 17511 [details]
Proposed patch for mainline to workaround this problem and detect other instances of corruption

Alan, Rafeł: could you test this patch and see if it solves the problem and prints the expected warnings?
Comment 86 Rafał Miłecki 2008-08-29 03:01:39 UTC
Created attachment 17526 [details]
dmesg | grep -i corrup

After booting 2.6.27-rc5 with proposed patch applied I get only:
zajec@sony:~> dmesg | grep -i corr
scanning 2 areas for BIOS corruption

So I tried s2ram (-f -p -m which works for me) and there is output of dmesg | grep -i corr after waking up.
Comment 87 Rafael J. Wysocki 2008-08-30 14:59:17 UTC
Patch : http://marc.info/?l=linux-kernel&m=122001615314700&w=2
Comment 88 Andy Wettstein 2008-09-12 08:24:11 UTC
Just to try and confirm some suspicions that this bug may be widespread.  I'm seeing this problem on a MSI AMD socket 754 motherboard with VIA K8T800 chipset and AMI BIOS running the 64 bit kernel.  I get the "unable to handle kernel paging request" when trying to resume from S3.  Booting with memmap=64k$0 makes suspend/resume work fine.  The 32 bit kernel works fine without any extra boot options.

Should I add any output to this bug report from this system?
Comment 89 Jeremy Fitzhardinge 2008-09-12 09:11:25 UTC
That's interesting to know.  Could you include the output of the corruption message here?

How much memory were you scanning for corruption?  More than the default 64k I guess.
Comment 90 Hugh Dickins 2008-09-12 09:25:27 UTC
Thanks for the report.  It would be interesting to see how closely
yours resembles the previous two cases - but when I say interesting,
we'll probably just say, "ooh, that's interesting, it's different"
or "ooh, that's interesting, it's the same", and not much more -
so you may not want to go to great lengths to slake our curiosity!

But if you can, please attach your dmesg, preferably the whole dmesg
from boot through suspend/resume through the failure(s) - if there
are lots all showing the same PMD line, no need for more than one.

Even (slightly) better would be to apply the patch indicated in
#87 (Jeremy's made a further patch series since, but the one in
#87 should be good enough) and send the dmesg from boot through
suspend/resume and the next couple of minutes after resume (it
does a check every minute).  I say _slightly_ better because I
don't really expect it to tell us anything more than the previous
dmesg; but it _might_ find some other places corrupted.

The 32-bit kernel is probably working fine because it's using the
corrupted pages in such a way the corruption doesn't hit anything
that matters: they may well be tied up in the GFP_DMA page reserve,
with nothing much needing those pages, or relying on their contents
across suspend/resume.

Thank you; but as I say, we probably won't work out anything much
from the info you supply, so don't break a leg getting it to us.

Comment 91 Jeremy Fitzhardinge 2008-09-12 09:45:04 UTC
(In reply to comment #90)
> Thanks for the report.  It would be interesting to see how closely
> yours resembles the previous two cases - but when I say interesting,
> we'll probably just say, "ooh, that's interesting, it's different"
> or "ooh, that's interesting, it's the same", and not much more -
> so you may not want to go to great lengths to slake our curiosity!

Well, if we can get a sense of how common the problem is, we can determine whether we should ban that area of memory by default.  It's certainly not "incredibly rare" any more.
 
> The 32-bit kernel is probably working fine because it's using the
> corrupted pages in such a way the corruption doesn't hit anything
> that matters: they may well be tied up in the GFP_DMA page reserve,
> with nothing much needing those pages, or relying on their contents
> across suspend/resume.

If the 32-bit kernel is scanning that area for corruption and not finding it, then it means that the bios is only causing corruption when the kernel is running in 64-bit mode.  Which is interesting.
Comment 92 Hugh Dickins 2008-09-12 10:03:03 UTC
(In reply to comment #90)

I've assumed that Andy is not running a kernel with any scanning patches,
just observing corruption as it originally appeared: that the corruption
is below 64kB and so memmap=64k$0 innoculates against it successfully in
the 64-bit case, but it fell on fallow ground in the 32-bit case.

But it would be interesting to try the #87 patch with both 64-bit and
32-bit kernels (since it seems Andy's already equipped to try both),
to check that they then behave in exactly the same way, as we'd expect.

Comment 93 Andy Wettstein 2008-09-12 11:19:48 UTC
(In reply to comment #92)
> (In reply to comment #90)
> 
> I've assumed that Andy is not running a kernel with any scanning patches,
> just observing corruption as it originally appeared: that the corruption
> is below 64kB and so memmap=64k$0 innoculates against it successfully in
> the 64-bit case, but it fell on fallow ground in the 32-bit case.
> 
> But it would be interesting to try the #87 patch with both 64-bit and
> 32-bit kernels (since it seems Andy's already equipped to try both),
> to check that they then behave in exactly the same way, as we'd expect.

You are correct that I am not using the bios scanning corruption patch.  I have a 64 bit kernel compiled with it applied, but I haven't had time to test it yet (There seems to also be a problem with suspend/resume with the motherboard's promise SATA controller, so that complicated the troubleshooting.  I switched to the VIA controller and that works fine).  

I'll try and get the dmesg and test out the patched kernel sometime this weekend.  
Comment 94 Andy Wettstein 2008-09-12 15:46:24 UTC
Created attachment 17755 [details]
dmesg 2.6.26 suspend/resume
Comment 95 Andy Wettstein 2008-09-12 15:48:27 UTC
I've attached the dmesg for the suspend/resume problem.

I tried 2.6.27-rc5 with the patch applied.  The machine resets itself on resume, so that is a bit of a problem.
Comment 96 Jeremy Fitzhardinge 2008-09-12 16:17:22 UTC
Are you reporting that there's been a general suspend/resume regression?  Does it still fail when you boot with memmap=64k$0?
Comment 97 Andy Wettstein 2008-09-12 19:34:20 UTC
(In reply to comment #96)
> Are you reporting that there's been a general suspend/resume regression?  Does
> it still fail when you boot with memmap=64k$0?
> 

There is definitely something going on with 2.6.27 kernels.  I just tested without the patch applied both with and without the memmap option.  In both cases the machine resets itself when resuming.

I am using the debian experimental builds for 2.6.27-rc5 from here:
http://kernel-archive.buildserver.net/debian-kernel

I've only tested 64 bit so far.
Comment 98 Andy Wettstein 2008-09-12 21:44:14 UTC
Created attachment 17756 [details]
dmesg 2.6.27-rc5 bios corruption
Comment 99 Andy Wettstein 2008-09-12 21:50:03 UTC
Ok,

32 bit suspend/resume doesn't reboot itself.  I compiled with the bios corruption detection patch and attached the output.
Comment 100 Hugh Dickins 2008-09-13 01:42:17 UTC
I'm now thoroughly confused.  I thought that patch (the one #87 points
to) only detects corruption between 0 and 0x10000, but your output is
showing corruption (or good use) between 0x243e0 and 0x268c4.

A bug in the patch, or you were running a later patchset from
Jeremy and changed the default, or I'm just in a muddle?

Sorry, I'll be offline for a day or so now: over to Jeremy I hope.

Comment 101 Jeremy Fitzhardinge 2008-09-13 07:55:15 UTC
Andy, as Hugh says, that output doesn't look right.  It looks like the output of my earlier buggy patch which scanned beyond the regions it was supposed to.

I also think the resume regression you're seeing is a separate bug, and you should file a new one for it.
Comment 102 Rafael J. Wysocki 2008-09-14 16:29:51 UTC
Andy, please file a separate bug report for the resume regression you're seeing, thanks.
Comment 103 Andy Wettstein 2008-09-14 19:06:03 UTC
Created attachment 17780 [details]
corrected dmesg 2.6.27-rc6 bios corruption
Comment 104 Andy Wettstein 2008-09-14 19:07:40 UTC
Yes, you were right about the bad patch.  Sorry.  I've updated to the newer one and attached the new output.  Hopefully it looks better.
Comment 105 Ingo Molnar 2008-09-14 23:36:14 UTC
> [...] I've updated to the newer one and attached the new output.  
> Hopefully it looks better.

thanks - it shows corruption in the 0xc000-0xc400 range. That's a 1 KB 
block at 48KB. The content looks structured and intentional, but i dont 
recognize it straight away.

i'm wondering, what's the EBDA of this box? The bootup log suggests it's 
at 0x9f400:

   [000009f400 - 0000100000]    BIOS reserved

but i'm not completely sure.

i've added a printout into tip/master:

  # 23ea780: x86: print out EBDA/lowmem address

could you please try to boot the latest tip/master:

   http://people.redhat.com/mingo/tip.git/README

and post a new dmesg? (straight after bootup - but it's fine with a 
corruption message included.) tip/master has all the detection patches 
included.

the BIOS itself seems quirkable as well:

   RSDT 3FFF0000, 002C (r1 AMIINT VIA_K8

do we have the dmidecode info of this system attached already?

	Ingo

Comment 106 Andy Wettstein 2008-09-15 15:01:10 UTC
Created attachment 17793 [details]
msi dmidecode
Comment 107 Andy Wettstein 2008-09-15 15:03:47 UTC
Created attachment 17794 [details]
dmesg from tip

Here is the dmesg from tip with the EBDA printout and I've attached the dmidecode from this machine, too.
Comment 108 Ingo Molnar 2008-09-16 00:42:27 UTC
> Here is the dmesg from tip with the EBDA printout and I've attached 
> the dmidecode from this machine, too.

thanks - the EBDA is at the expected place:

 [    0.000000] BIOS EBDA/lowmem at: 0009fc00/0009f400

so there's no clue i can see in the logs or in other system environment 
data that would notify the kernel that there's some extra activity 
expected at physical address 0xc000. Memory 0xc000 is marked by the BIOS 
as general purpose RAM:

[    0.000000]  BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)

and we utilize that RAM in Linux and only reserve the first 4K [which is 
customary BIOS scratch area] - so it's 635K of perfectly fine RAM, and 
the kernel will break if anything gets modified in that RAM. The commit 
that got identified in the bisection just happened to move around 
allocations so that we broke in a more apparent (and more violent) way.

So, based on your dmidecode info i've created a quirk for AMI BIOSen, to 
reserve 0xc000 forcibly, and applied it to the 
tip/x86/memory-corruption-check tree:

 # 8a64124: x86: add DMI quirk for AMI BIOS which corrupts address 0xc000 during resume

Could you please check the end result and try latest tip/master with 
CONFIG_X86_CHECK_BIOS_CORRUPTION _disabled_? The kernel should just work 
out of box, and you should get the new DMI quirk printk during bootup. 
Please attach that dmesg output too, so that we can double check the end 
result.

If this commit works as expected then this is queued up for v2.6.28 
merging as part of the x86 tree, alongside the memory-corruption-check 
feature.

Thanks,

	Ingo

Comment 109 Ingo Molnar 2008-09-16 01:20:12 UTC
> If this commit works as expected then this is queued up for v2.6.28 
> merging as part of the x86 tree, alongside the memory-corruption-check 
> feature.

ok, i've reviewed all the dmidecode and dmesg data in this bugzilla and 
the problem is much more widespread than suspected. Both AMI and Phoenix 
BIOSes are affected, both old and new BIOSes. There's no good date 
cutoff and no good BIOS type and system type cutoff either.

So the only really sane way to proceed is to reserve the first 64K of 
memory on all AMI and Phoenix BIOSen. (Probably a lot of laptops are 
affected as well so we might want to extend this in the future.)

This means that in practice the majority of x86 boxes will have this 
quirk activated - but that's the only sane way to approach this, we 
simply need help from the BIOS vendors to create any more specific 
quirks that dont result in the wasting of 60K of RAM. (the first 4K is 
already reserved)

I've added a new kernel option as well to disable these quirks - that 
can be used on those systems where the BIOS behaves.

I've added all these quirks to tip/master. Everyone who reported 
suspend/resume problems (and HDMI plug/unplug problems, etc.) and memory 
corruption in this bugzilla, please try to boot the latest tip/master 
kernel:

  http://people.redhat.com/mingo/tip.git/README

_without_ enabling the CONFIG_X86_CHECK_BIOS_CORRUPTION=y feature.

The kernel should print a warning about an AMI or Phoenix BIOS and 
should reserve the first 64K, and the system should just work out of box 
with no crashes and hangs.

I suspect this will resolve a lot of historic 'suspend+resume does not 
work under Linux' bugs as well.

Thanks,
	
	Ingo

Comment 110 Yinghai Lu 2008-09-16 02:20:11 UTC
On Tue, Sep 16, 2008 at 1:19 AM, Ingo Molnar <mingo@elte.hu> wrote:
>
>> If this commit works as expected then this is queued up for v2.6.28
>> merging as part of the x86 tree, alongside the memory-corruption-check
>> feature.
>
> ok, i've reviewed all the dmidecode and dmesg data in this bugzilla and
> the problem is much more widespread than suspected. Both AMI and Phoenix
> BIOSes are affected, both old and new BIOSes. There's no good date
> cutoff and no good BIOS type and system type cutoff either.
>
> So the only really sane way to proceed is to reserve the first 64K of
> memory on all AMI and Phoenix BIOSen. (Probably a lot of laptops are
> affected as well so we might want to extend this in the future.)
>
> This means that in practice the majority of x86 boxes will have this
> quirk activated - but that's the only sane way to approach this, we
> simply need help from the BIOS vendors to create any more specific
> quirks that dont result in the wasting of 60K of RAM. (the first 4K is
> already reserved)
>
> I've added a new kernel option as well to disable these quirks - that
> can be used on those systems where the BIOS behaves.
>
> I've added all these quirks to tip/master. Everyone who reported
> suspend/resume problems (and HDMI plug/unplug problems, etc.) and memory
> corruption in this bugzilla, please try to boot the latest tip/master
> kernel:
>
>  http://people.redhat.com/mingo/tip.git/README
>
> _without_ enabling the CONFIG_X86_CHECK_BIOS_CORRUPTION=y feature.
>
> The kernel should print a warning about an AMI or Phoenix BIOS and
> should reserve the first 64K, and the system should just work out of box
> with no crashes and hangs.
>

may have problem
static int __init dmi_low_memory_corruption(const struct dmi_system_id *d)
{
        printk(KERN_NOTICE
                "%s detected: BIOS may corrupt low RAM, working it around.\n",
                d->ident);

        reserve_early(0x0, 0x10000, "BIOS quirk");

        return 0;
}

because it will clash with pre fixed early_res
static struct early_res early_res[MAX_EARLY_RES] __initdata = {
        { 0, PAGE_SIZE, "BIOS data page" },     /* BIOS data page */
#if defined(CONFIG_X86_64) && defined(CONFIG_X86_TRAMPOLINE)
        { TRAMPOLINE_BASE, TRAMPOLINE_BASE + 2 * PAGE_SIZE, "TRAMPOLINE" },
#endif
#if defined(CONFIG_X86_32) && defined(CONFIG_SMP)
        /*
         * But first pinch a few for the stack/trampoline stuff
         * FIXME: Don't need the extra page at 4K, but need to fix
         * trampoline before removing it. (see the GDT stuff)
         */
        { PAGE_SIZE, PAGE_SIZE + PAGE_SIZE, "EX TRAMPOLINE" },
        /*
         * Has to be in very low memory so we can execute
         * real-mode AP code.
         */
        { TRAMPOLINE_BASE, TRAMPOLINE_BASE + PAGE_SIZE, "TRAMPOLINE" },
#endif
        {}
};


please change to
        reserve_early(0x0, 0x10000, "BIOS quirk");
to
        reserve_early_overlap_ok(0x0, 0x10000, "BIOS quirk");

YH

Comment 111 Ingo Molnar 2008-09-16 03:17:51 UTC
* Yinghai Lu <yhlu.kernel@gmail.com> wrote:

> please change:
>         reserve_early(0x0, 0x10000, "BIOS quirk");
> to
>         reserve_early_overlap_ok(0x0, 0x10000, "BIOS quirk");

done - thanks! The right tip/master is:

# tip/master 0714a44: Merge branch 'x86/memory-corruption-check'

or later.

	Ingo

Comment 112 Jeremy Fitzhardinge 2008-09-16 08:00:53 UTC
Ingo, I'd really like to get a better sense of how widespread this corruption is rather than masking it right now.  I think we should reserve a larger area (possibly up to 1M, since we don't really have any reason to believe that its strictly the first 64k) and do the non-zero scan, and enable it by default in some moderately widely used kernel.

While I'm perfectly OK with doing this kind of DMI detection to work around the problem, I'm not convinced we're really sure what the problem is yet.
Comment 113 H. Peter Anvin 2008-09-16 10:42:23 UTC
It's worse than that.  We have at least one report where the corruption is near 256K.
Comment 114 Pavel Machek 2008-09-17 00:36:56 UTC
> * Yinghai Lu <yhlu.kernel@gmail.com> wrote:
> 
> > please change:
> >         reserve_early(0x0, 0x10000, "BIOS quirk");
> > to
> >         reserve_early_overlap_ok(0x0, 0x10000, "BIOS quirk");
> 
> done - thanks! The right tip/master is:
> 
> # tip/master 0714a44: Merge branch 'x86/memory-corruption-check'
> 
> or later.

overlap_ok? Will that mean we still use known-bad memory for
trampolines?

Question is what is memory that sometimes spontaneously changes its
contents good for. I guess it would be okay for random seed ;-).
								Pavel

Comment 115 Andy Wettstein 2008-09-17 07:36:19 UTC
Created attachment 17837 [details]
dmesg with CONFIG_X86_RESERVE_LOW_64K=y

I've tested it and suspend/resume works fine without additional options.  It seems a little weird because I don't see anything being printed out that says it is reserving 64k.
Comment 116 Andy Wettstein 2008-09-17 07:43:41 UTC
Now that I think about it a bit more.  I am using the 32 bit kernel, which didn't have problems suspending/resuming for me (even with the memory corruption problems).  So maybe that new option isn't working correctly since I don't see any output about reserving the 64k.  I had problems suspending/resuming with the 64 bit kernel, but I can't test that because of bug #11568
Comment 117 Jeremy Fitzhardinge 2008-09-17 09:26:10 UTC
(In reply to comment #114)
> overlap_ok? Will that mean we still use known-bad memory for
> trampolines?

They haven't moved in a long time, so apparently it isn't an issue.
Comment 118 Ingo Molnar 2008-09-18 03:46:49 UTC
> I've tested it and suspend/resume works fine without additional 
> options.  It seems a little weird because I don't see anything being 
> printed out that says it is reserving 64k.

ah - my bad.

I missed the detail that dmi_scan_machine() [which scans and parses the 
system strings into kernel data structures and initializes the whole DMI 
quirk mechanism] is called _after_ this quirk handler. So all the 
strings were empty and the quirk had no chance to kick in.

i'll also put a WARN() into drivers/firmware/dmi_scan.c so that we are 
notified about too early dmi_check_system() calls ;-)

Could you please re-test latest tip/master, does the BIOS quirk message 
show up now? (Should come right after the "DMI 2.3" line)

OTOH, it would be nice to check a 64-bit kernel, where this problem 
showed up, to make sure this is a fix. Plus it would be nice to have a 
deeper understanding about where exactly that corruption comes from ...

	Ingo

Comment 119 Rafał Miłecki 2008-09-19 06:59:33 UTC
Created attachment 17883 [details]
dmesg from tip (commit 74546a8cd9a4e2...)
Comment 120 Rafał Miłecki 2008-09-19 07:02:18 UTC
I tried git tip but it _does not_ fix problem with my HDMI.

grep 'CONFIG_X86' .config | egrep 'RESERVE|CORRUPT'
# CONFIG_X86_CHECK_BIOS_CORRUPTION is not set
CONFIG_X86_RESERVE_LOW_64K=y

dmesg is attached, I'll add photos of what happens after connecting HDMI and switching to init 5.
Comment 121 Rafał Miłecki 2008-09-19 07:17:46 UTC
Created attachment 17884 [details]
photos of messages after typing "init 5"
Comment 122 Andy Wettstein 2008-09-21 19:11:31 UTC
I updated to git tip and I still didn't see anything in the dmesg that says the memory has been reserved.
Comment 123 Ingo Molnar 2008-09-22 00:28:59 UTC
> I updated to git tip and I still didn't see anything in the dmesg that 
> says the memory has been reserved.

yes - sorry about that: the workaround is not active because DMI 
scanning is not active. The problem is that moving DMI scanning earlier 
crashes the box. Still working on this issue.

	Ingo

Comment 124 Yinghai Lu 2008-09-22 01:09:03 UTC
On Mon, Sep 22, 2008 at 12:28 AM, Ingo Molnar <mingo@elte.hu> wrote:
>
>> I updated to git tip and I still didn't see anything in the dmesg that
>> says the memory has been reserved.
>
> yes - sorry about that: the workaround is not active because DMI
> scanning is not active. The problem is that moving DMI scanning earlier
> crashes the box. Still working on this issue.

32 bit crash?
64 bit should not crash at least.

YH

Comment 125 Rafał Miłecki 2008-09-26 09:13:20 UTC
Just wanted to ask if there is any progress... Were you able Ingo to work out DMI scanning?
Don't want to hustle too much but definitely would be nice to remove this before 2.6.27.
Comment 126 Ingo Molnar 2008-09-26 10:11:51 UTC
> Just wanted to ask if there is any progress... Were you able Ingo to 
> work out DMI scanning? Don't want to hustle too much but definitely 
> would be nice to remove this before 2.6.27.

tip/master is supposed to work out of box - could you check it?

	Ingo

Comment 127 Rafał Miłecki 2008-09-26 10:23:03 UTC
Ingo: what kernel configuration do you mean? Do you think about
# CONFIG_X86_CHECK_BIOS_CORRUPTION is not set
CONFIG_X86_RESERVE_LOW_64K=y
in tip/master?
Comment 128 Rafał Miłecki 2008-09-26 10:40:16 UTC
Ah, in tip I just found

commit 2216d199b1430d1c0affb1498a9ebdbd9c0de439
Author: Yinghai Lu <yhlu.kernel@gmail.com>
Date:   Mon Sep 22 02:52:26 2008 -0700

    x86: fix CONFIG_X86_RESERVE_LOW_64K=y

    The bad_bios_dmi_table() quirk never triggered because we do DMI setup
    too late. Move it a bit earlier.

So you probably mean config from comment #127. Will test this in moment.
Comment 129 Rafał Miłecki 2008-09-26 12:01:41 UTC
Created attachment 18058 [details]
dmesg from tip (commit 7f4da50cdcac...)

With current tip reservation of low memory works! I can plug in my HDMI and it doesn't crash machine:
> DMI 2.4 present.
> AMI BIOS detected: BIOS may corrupt low RAM, working it around.

Is this patch from tip (adding CONFIG_X86_RESERVE_LOW_64K) acceptable for mainline?
Comment 130 Ingo Molnar 2008-09-27 08:38:22 UTC
> With current tip reservation of low memory works! I can plug in my 
> HDMI and it doesn't crash machine:
>
> > DMI 2.4 present.
> > AMI BIOS detected: BIOS may corrupt low RAM, working it around.
> 
> Is this patch from tip (adding CONFIG_X86_RESERVE_LOW_64K) acceptable 
> for mainline?

not for v2.6.27, but maybe for v2.6.28. Jeremy suggested there might be 
some other reason for this corruption (non-BIOS corruption?), but to me 
it looks like as if an SMI handler assumed that certain bits of low 
memory are BIOS-reserved and corrupted about 1K of memory. So i guess in 
the long run we'll have to reserve this RAM.

	Ingo

Comment 131 Ingo Molnar 2008-09-27 10:52:22 UTC
> I updated to git tip and I still didn't see anything in the dmesg that 
> says the memory has been reserved.

btw., is the 'spontaneous reboot on resume' problem fixed in tip/master 
too?

	Ingo

Comment 132 Andy Wettstein 2008-09-27 12:33:22 UTC
(In reply to comment #131)
> > I updated to git tip and I still didn't see anything in the dmesg that 
> > says the memory has been reserved.
> 
> btw., is the 'spontaneous reboot on resume' problem fixed in tip/master 
> too?

Just updated to latest tip and it still reboots on resume.  

I do see the message about corrupting low RAM, now.
Comment 133 Ingo Molnar 2008-09-27 13:50:57 UTC
> Just updated to latest tip and it still reboots on resume.

hm, reboot on resume is a tough issue. Is that problem bisectable? (i.e. 
does any recent-ish kernel work fine?)

	Ingo

Comment 134 Andy Wettstein 2008-09-29 12:01:02 UTC
(In reply to comment #133)
> 
> hm, reboot on resume is a tough issue. Is that problem bisectable? (i.e. 
> does any recent-ish kernel work fine?)

I started a bisect for this.  I'll add comments to bug #11568.

Comment 135 Rafał Miłecki 2008-10-12 10:11:00 UTC
May I bring this to your focus again, in time of preparing for 2.6.28-rc1?

Jeremy: do you still have some doubts about kind of corruption?
Comment 136 Rafał Miłecki 2008-10-14 07:24:15 UTC
Nice, I can see this in mainline:

commit fc38151947477596aa27df6c4306ad6008dc6711
Author: Ingo Molnar
Date:   Tue Sep 16 10:07:34 2008 +0200

    x86: add X86_RESERVE_LOW_64K

Followed by fixing commits:
fc38151947477596aa27df6c4306ad6008dc6711
2216d199b1430d1c0affb1498a9ebdbd9c0de439

Bug can be probably closed now?

Note You need to log in before you can comment on or make changes to this bug.