Bug 12357
Summary: | 2.6.28 kernel panic on GA-MA790FX-DQ6 | ||
---|---|---|---|
Product: | Other | Reporter: | Marcus Husar (marcus.husar) |
Component: | Modules | Assignee: | platform_x86_64 (platform_x86_64) |
Status: | CLOSED INVALID | ||
Severity: | high | CC: | akpm, alan, marcus.husar, mingo |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.28 vanilla kernel | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
.config of my kernel (2.6.28-git5)
The whole kernel panic minicom.2.6.26.cap.gz minicom.2.6.28.cap.gz |
Description
Marcus Husar
2009-01-04 03:07:27 UTC
Created attachment 19634 [details]
.config of my kernel (2.6.28-git5)
Created attachment 19635 [details]
The whole kernel panic
Your hardware is dying: HARDWARE ERROR CPU 0: Machine Check Exception: 4 Bank 4: fe0000080005001b TSC 2f219167c2 ADDR 20000554 MISC c008000001000000 This is not a software problem! Run through mcelog --ascii to decode and contact your hardware vendor Kernel panic - not syncing: Machine check Hello Marcin, I have seen that message but didn't believe it (I don't want my hardware to die). So I thought there must be a kernel bug. Older kernels worked without any noticeable problem. I will try to capture the same message with a working Debian-kernel. It will take me an hour because I'm at home at the moment. That is what mcelog says: HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 0 BANK 0 TSC 2f219167c2 STATUS 0 MCGSTATUS 0 Now I thought, perhaps the CPU is dying. But then I found this message from the LKML: http://lkml.indiana.edu/hypermail/linux/kernel/0605.1/2085.html I will replace the memory and hope that kernel 2.6.28 is booting up properly. If this doesn't help I'll replace the CPU and then the mainboard. Anyway I'll notify you what happened. Thank you for your efforts. Marcus bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=12357 > > > > > > ------- Comment #3 from marcin.slusarz@gmail.com 2009-01-05 12:31 ------- > Your hardware is dying: > > > HARDWARE ERROR > CPU 0: Machine Check Exception: 4 Bank 4: fe0000080005001b > TSC 2f219167c2 ADDR 20000554 MISC c008000001000000 > This is not a software problem! > Run through mcelog --ascii to decode and contact your hardware vendor > Kernel panic - not syncing: Machine check > > I replaced the memory. Now the kernel panic has gone. The problems with smp.c and the "Machine Check Exception" are resolved. But there is still a problem with ioremap.c and the netsc520 (the whole bootup process is attached as minicom.2.6.28.cap.gz): ------------[ cut here ]------------ WARNING: at arch/x86/mm/ioremap.c:240 __ioremap_caller+0x173/0x2f5() Hardware name: GA-MA790FX-DQ6 Pid: 1, comm: swapper Not tainted 2.6.28-git5-1 #1 Call Trace: [<ffffffff8023cdfe>] warn_slowpath+0xd3/0x10d [<ffffffff80226cd5>] __change_page_attr_set_clr+0x16b/0x867 [<ffffffff80251648>] down_trylock+0x28/0x2e [<ffffffff80251648>] down_trylock+0x28/0x2e [<ffffffff8023d2ff>] try_acquire_console_sem+0x10/0x31 [<ffffffff80996bca>] init_netsc520+0x0/0xfd [<ffffffff806f4bb9>] printk+0x4e/0x56 [<ffffffff80226717>] __ioremap_caller+0x173/0x2f5 [<ffffffff80996bfd>] init_netsc520+0x33/0xfd [<ffffffff80996bca>] init_netsc520+0x0/0xfd [<ffffffff80996bfd>] init_netsc520+0x33/0xfd [<ffffffff80209051>] _stext+0x51/0x120 [<ffffffff802e80c6>] create_proc_entry+0x7d/0x92 [<ffffffff80270e01>] register_irq_proc+0x94/0xac [<ffffffff8096d613>] kernel_init+0x119/0x16b [<ffffffff8020ceba>] child_rip+0xa/0x20 [<ffffffff8096d4fa>] kernel_init+0x0/0x16b [<ffffffff8020ceb0>] child_rip+0x0/0x20 ---[ end trace fefc41b665ffb5f9 ]--- Booting up with kernel 2.6.26 none of the problems above appear. Even if the replaced memory is used (minicom.2.6.26.cap.gz is also attached). Best regards, Marcus Created attachment 19669 [details]
minicom.2.6.26.cap.gz
Created attachment 19670 [details]
minicom.2.6.28.cap.gz
> But there is still a problem with ioremap.c and the netsc520 (the whole
> bootup process is attached as minicom.2.6.28.cap.gz):
Well, it's not a real problem. The warning just tells you that the
resource which is accessed is in the RAM address space.
BIOS-e820: 0000000000100000 - 000000007fee0000 (usable)
NetSc520 flash device: 0x100000 at 0x200000
The driver is for an evaluation board and has a hard coded address for
the FLASH chip. On your machine there is definitely no such device at
this address and the ioremap code complains correctly that this access
is wrong.
Actually this driver is complete crap. The ioremap succeeds despite
the warning and the driver somehow pretends that it found a flash
device:
Creating 4 MTD partitions on "netsc520 Flash Bank":
0x00000000-0x000c0000 : "NetSc520 boot kernel"
That means it poked in the ioremapped ram.
Please disable the driver for now. I look into fixing this along with
some others of the same category as there is trouble waiting.
@Venki: I wonder why the remap succeeds at all.
Thanks,
tglx
|