Bug 11699

Summary: 2.6.27-rc-7: BUG: scheduling while atomic, c1e_idle+0x98/0xe0
Product: Platform Specific/Hardware Reporter: Rafael J. Wysocki (rjw)
Component: x86-64Assignee: platform_x86_64 (platform_x86_64)
Status: CLOSED CODE_FIX    
Severity: normal CC: mpartap, tglx
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.27-rc-7 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 11167    
Attachments: screenshot after crash
another crash
shot of the MCE occuring

Description Rafael J. Wysocki 2008-10-04 11:33:48 UTC
Subject    : 2.6.27-rc-7: BUG: scheduling while atomic: swapper/0/0x00000102
Submitter  : Prakash Punnoor <prakash@punnoor.de>
Date       : 2008-09-28 17:45
References : http://marc.info/?l=linux-kernel&m=122262403415629&w=4
Handled-By : Thomas Gleixner <tglx@linutronix.de>

This entry is being used for tracking a regression from 2.6.26.  Please don't
close it until the problem is fixed in the mainline.
Comment 1 Rafael J. Wysocki 2008-10-24 14:09:41 UTC
Thomas, I think this is fixed now, correct?
Comment 2 Prakash Punnoor 2008-10-26 02:07:35 UTC
At least I didn't have the problem again (no being on 2.6.27.3). But I also don't know how to properly reproduce it.
Comment 3 Marcel Partap 2008-11-19 03:17:17 UTC
Created attachment 18934 [details]
screenshot after crash
Comment 4 Marcel Partap 2008-11-19 03:18:06 UTC
Created attachment 18935 [details]
another crash
Comment 5 Marcel Partap 2008-11-19 03:20:34 UTC
My system is very regularly going down because of this since about a week or two. That coincided with inserting an ati PCIE card into my system but as i am getting crashes with either ati proprietary or radeonhd driver, i don't know if that is relevant to this. as of now i haven't been successfull in getting my kdump kernel to boot after the freeze for whatever reason so i am voting for reopening of this bug and attaching two JPEG stackdumps ;)
Comment 6 Marcel Partap 2008-11-19 11:24:16 UTC
..or should i rather open a new bug for a newer version?
Comment 7 Thomas Gleixner 2008-11-19 14:01:43 UTC
One of the screenshots shows a machine check exception, which means
that there is something seriously wrong with your system.

The second screenshot also has the machine check taint flag set ("M"). 

That's a hardware problem. Does it go away when you replace the PCIE
card you added recently ?

Any chance to capture the printks via a serial console ?

Thanks,

	tglx
Comment 8 Marcel Partap 2008-11-29 10:00:15 UTC
Created attachment 19073 [details]
shot of the MCE occuring

Hi Thomas,
sorry for the delay i did not yet get a 0modem cable to setup a serial console, neither did i take that PCIE card out. The strange thing is, right in between those incidents my machine was running fine for almost a week (?) and now it is back to regular crashes; i just watched another one occur. The screenshot attached shows the syslog on tty12 and feeding that line into mcelog --ascii manually gives:
# echo "CPU 0: Machine Check Exception: 4 Bank 4: b200000000070f0f" |mcelog --ascii
WARNING: with --dmi mcelog --ascii must run on the same machine with the
     same BIOS/memory configuration as where the machine check occurred.
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 4 northbridge   Northbridge Watchdog error
       bit57 = processor context corrupt
       bit61 = error uncorrected
  bus error 'generic participation, request timed out
      generic error mem transaction
      generic access, level generic'
STATUS b200000000070f0f MCGSTATUS 4
Can you interpret anything into this?
Comment 9 Marcel Partap 2008-12-02 19:29:22 UTC
How stupid.. after making sure memtest showed up no problems i removed the card and now with the onboard nforce GPU the crashes are gone, but also my X performance (shared memory fake VRAM).. is this a know problem with nforce4 chipsets in combination with ATI PCIE cards?
Comment 10 Thomas Gleixner 2008-12-12 00:03:40 UTC
> How stupid.. after making sure memtest showed up no problems i
> removed the card and now with the onboard nforce GPU the crashes are
> gone, but also my X performance (shared memory fake VRAM).. is this
> a know problem with nforce4 chipsets in combination with ATI PCIE
> cards?

Not that I know. I have no idea what might cause the massive
corruption on your board. Did you check for BIOS updates for the board
already ?

Thanks,

	tglx