Bug 11699
Summary: | 2.6.27-rc-7: BUG: scheduling while atomic, c1e_idle+0x98/0xe0 | ||
---|---|---|---|
Product: | Platform Specific/Hardware | Reporter: | Rafael J. Wysocki (rjw) |
Component: | x86-64 | Assignee: | platform_x86_64 (platform_x86_64) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | mpartap, tglx |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.27-rc-7 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 11167 | ||
Attachments: |
screenshot after crash
another crash shot of the MCE occuring |
Description
Rafael J. Wysocki
2008-10-04 11:33:48 UTC
Thomas, I think this is fixed now, correct? At least I didn't have the problem again (no being on 2.6.27.3). But I also don't know how to properly reproduce it. Created attachment 18934 [details]
screenshot after crash
Created attachment 18935 [details]
another crash
My system is very regularly going down because of this since about a week or two. That coincided with inserting an ati PCIE card into my system but as i am getting crashes with either ati proprietary or radeonhd driver, i don't know if that is relevant to this. as of now i haven't been successfull in getting my kdump kernel to boot after the freeze for whatever reason so i am voting for reopening of this bug and attaching two JPEG stackdumps ;) ..or should i rather open a new bug for a newer version? One of the screenshots shows a machine check exception, which means that there is something seriously wrong with your system. The second screenshot also has the machine check taint flag set ("M"). That's a hardware problem. Does it go away when you replace the PCIE card you added recently ? Any chance to capture the printks via a serial console ? Thanks, tglx Created attachment 19073 [details]
shot of the MCE occuring
Hi Thomas,
sorry for the delay i did not yet get a 0modem cable to setup a serial console, neither did i take that PCIE card out. The strange thing is, right in between those incidents my machine was running fine for almost a week (?) and now it is back to regular crashes; i just watched another one occur. The screenshot attached shows the syslog on tty12 and feeding that line into mcelog --ascii manually gives:
# echo "CPU 0: Machine Check Exception: 4 Bank 4: b200000000070f0f" |mcelog --ascii
WARNING: with --dmi mcelog --ascii must run on the same machine with the
same BIOS/memory configuration as where the machine check occurred.
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 4 northbridge Northbridge Watchdog error
bit57 = processor context corrupt
bit61 = error uncorrected
bus error 'generic participation, request timed out
generic error mem transaction
generic access, level generic'
STATUS b200000000070f0f MCGSTATUS 4
Can you interpret anything into this?
How stupid.. after making sure memtest showed up no problems i removed the card and now with the onboard nforce GPU the crashes are gone, but also my X performance (shared memory fake VRAM).. is this a know problem with nforce4 chipsets in combination with ATI PCIE cards? > How stupid.. after making sure memtest showed up no problems i
> removed the card and now with the onboard nforce GPU the crashes are
> gone, but also my X performance (shared memory fake VRAM).. is this
> a know problem with nforce4 chipsets in combination with ATI PCIE
> cards?
Not that I know. I have no idea what might cause the massive
corruption on your board. Did you check for BIOS updates for the board
already ?
Thanks,
tglx
|