Bug 5342 - mce_log() related kernel hang on 8CPU dual core Opteron
Summary: mce_log() related kernel hang on 8CPU dual core Opteron
Status: RESOLVED PATCH_ALREADY_AVAILABLE
Alias: None
Product: Other
Classification: Unclassified
Component: Other (show other bugs)
Hardware: i386 Linux
: P2 high
Assignee: other_other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-10-01 03:05 UTC by Mark Williamson
Modified: 2007-09-03 21:02 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.14-rc3
Subsystem:
Regression: ---
Bisected commit-id:


Attachments

Description Mark Williamson 2005-10-01 03:05:56 UTC
Most recent kernel where this bug did not occur:
2.6.13 (I think..)

Distribution:
RHEL AS4/U1, but with a stock kernel from kernel.org

Hardware Environment:
8CPU Dual Core Opteron; http://www.iwill.net/product_2.asp?p_id=90&sp=Y

Software Environment:
RHEL AS4/U1

Problem Description:
When executing a piece of chemistry software, it crashes
and the following message appears. It was captured on another machine via the
serial port console:

....{blah}
ip_tables: (C) 2000-2002 Netfilter core team
NMI Watchdog detected LOCKUP on CPU 14
CPU 14
Modules linked in: ipv6 autofs4 af_packet pcmcia firmware_class yenta_socket
rsrc_nonstatic pcmcia_core binfm
t_misc dm_mod tsdev usbhid video thermal processor fan container button battery
ac ohci_hcd usbcore i2c_amd75
6 i2c_core e100 mii e1000 floppy amd74xx ide_generic ext3 jbd ide_disk ide_core
3w_xxxx sd_mod scsi_mod
Pid: 0, comm: swapper Not tainted 2.6.14-rc2-smp #2
RIP: 0010:[<ffffffff80115080>] <ffffffff80115080>{mce_log+16}
RSP: 0018:ffff810143873e80  EFLAGS: 00000046
RAX: 0000000000000000 RBX: 0000000000000004 RCX: 0000000000000411
RDX: 0000000000000000 RSI: a40000000005001b RDI: ffff810143873f18
RBP: 0000000000000010 R08: a40000000005001b R09: ffff810143873f18
R10: ffff81044384e9c0 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000411 R14: 0000000000000000 R15: 0000000000000000
FS:  0000000041415960(0000) GS:ffffffff804f5f00(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000041414fe0 CR3: 0000000343b23000 CR4: 00000000000006e0
Process swapper (pid: 0, threadinfo ffff810005660000, task ffff81000565c3c0)
Stack: ffffffff801154f3 000000000000000f 000000004384d6e0 0000182ad53e4a36
       0000000100000001 ffffffff80539780 0000000000000000 ffffffff80487f30
       0000000000000000 ffff810143873f4c
Call Trace: <IRQ> <ffffffff801154f3>{do_machine_check+675}
<ffffffff8013e653>{run_timer_softirq+499}
       <ffffffff80118bb9>{smp_call_function_interrupt+73}
<ffffffff8010e8e0>{call_function_interrupt+132}


Steps to reproduce:
Execute the piece of chemistry software.
Comment 1 Mark Williamson 2005-10-03 13:13:48 UTC
Above test case tried with 2.6.14-rc3:


{generic dmesg....blah}
3w-xxxx: scsi0: AEN: INFO: Initialization started: Unit #0.
3w-xxxx: scsi0: AEN: INFO: Initialization complete: Unit #0.

CPU 28: Machine Check Exception:                4 Bank 4: b200000000070f0f
TSC 10311caf801c4

CPU 18: Machine Check Exception:                4 Bank 4: b200000000070f0f
TSC 10311caf82707

CPU 26: Machine Check Exception:                4 Bank 4: b200000000070f0f
TSC 10311caf81bae

CPU 16: Machine Check Exception:                4 Bank 4: b200000000070f0f
TSC 10311caf83f95

CPU 24: Machine Check Exception:                4 Bank 4: b200000000070f0f
TSC 10311caf85160

CPU 30: Machine Check Exception:                4 Bank 4: b200000000070f0f
TSC 10311caf86b5d

CPU 22: Machine Check Exception:                4 Bank 4: b200000000070f0f
TSC 10311caf861df
Kernel panic - not syncing: Machine check
 NMI Watchdog detected LOCKUP on CPU 12
CPU 12
Modules linked in: ipv6 autofs4 pcmcia firmware_class yenta_socket
rsrc_nonstatic pcmcia_core binfmt_misc dm_mod af_packet tsdev usbhid video
thermal processor fan container button battery ac ohci_hcd usbcore i2c_amd756
i2c_core e100 mii e1000 floppy amd74xx ide_generic ext3 jbd ide_disk ide_core
3w_xxxx sd_mod scsi_mod
Pid: 30023, comm: l502.exe Tainted: G   M  2.6.14-rc3-smp #2
RIP: 0010:[<ffffffff80118a34>] <ffffffff80118a34>{__smp_call_function+116}
RSP: 0000:ffff81073fe21ca8  EFLAGS: 00000097
RAX: 0000000000000009 RBX: 000000000000000f RCX: 0000000000000000
RDX: 0000000000000010 RSI: 0000000000000010 RDI: 000000000000efff
RBP: 0000000000000000 R08: 00000000000000fc R09: 0000000000000010
R10: 0000000000000000 R11: ffffffff8011b670 R12: ffffffff80118b30
R13: 0000000000000000 R14: 00010311caf7ed0f R15: ffffffff8035de08
FS:  0000000040411960(0063) GS:ffffffff804fce00(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00002aaab6eee7b8 CR3: 000000042dd1c000 CR4: 00000000000006e0
Process l502.exe (pid: 30023, threadinfo ffff810576d2e000, task ffff81057efa8e80)
Stack: ffffffff80118b30 0000000000000000 0000000000000009 ffffffff00000000
       ffff81074386c500 0000000000000000 0000000000000000 0000000000000000
       ffffffff80409080 ffffffff80118b90
Call Trace: <#MC> <ffffffff80118b30>{smp_really_stop_cpu+0}
<ffffffff80118b90>{smp_send_stop+64}
       <ffffffff80135242>{panic+210} <ffffffff8010f7bc>{oops_begin+92}
       <ffffffff80115198>{print_mce+136} <ffffffff80115286>{mce_panic+166}
       <ffffffff801156e0>{do_machine_check+1072}
<ffffffff8010f117>{machine_check+127}
        <EOE>

Code: 39 d8 74 08 f3 90 eb f4 66 66 66 90 85 ed 74 0e 8b 44 24 14
console shuts up ...
 NMI Watchdog detected LOCKUP on CPU 0
Kernel panic - not syncing: Aiee, killing interrupt handler!
 CPU 0
Modules linked in: ipv6 autofs4 pcmcia firmware_class yenta_socket
rsrc_nonstatic pcmcia_core binfmt_misc dm_mod af_packet tsdev usbhid video
thermal processor fan container button battery ac ohci_hcd usbcore i2c_amd756
i2c_core e100 mii e1000 floppy amd74xx ide_generic ext3 jbd ide_disk ide_core
3w_xxxx sd_mod scsi_mod
Pid: 30025, comm: l502.exe Tainted: G   M  2.6.14-rc3-smp #2
RIP: 0010:[<ffffffff803411b9>] <ffffffff803411b9>{.text.lock.spinlock+2}
RSP: 0000:ffffffff8047ddd0  EFLAGS: 00000086
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000010
RDX: 0000000000000000 RSI: ffffffff8047de78 RDI: ffffffff80406bac
RBP: 000000000000001f R08: b200000000070f0f R09: ffffffff8047dec8
R10: 0000000000000010 R11: 0000000000000003 R12: ffffffff8047de78
R13: ffffffff80408780 R14: 00010311caf81e70 R15: ffffffff8035de08
FS:  0000000040c13960(0063) GS:ffffffff804fc800(0000) knlGS:000000005555d660
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00002aaaaaaac000 CR3: 000000042dd1c000 CR4: 00000000000006e0
Process l502.exe (pid: 30025, threadinfo ffff810553d62000, task ffff81057efa80c0)
Stack: ffffffff8010f796 0000000000000000 0000000000000046 0000000000000000
       ffffffff8011520a 0000000000000000 0000000000000005 0000000000000014
       ffffffff8047df58 0000000000000415
Call Trace: <#MC> <ffffffff8010f796>{oops_begin+54} <ffffffff8011520a>{mce_panic+42}
       <ffffffff801156e0>{do_machine_check+1072}
<ffffffff8010f117>{machine_check+127}
       <ffffffff80227352>{__bitmap_weight+18}  <EOE>  <IRQ>
<ffffffff8012f854>{scheduler_tick+660}
       <ffffffff8013eca4>{update_process_times+260}
<ffffffff80119329>{smp_apic_timer_interrupt+57}
       <ffffffff8010e96c>{apic_timer_interrupt+132}  <EOI>

Code: 80 3f 00 7e f9 e9 1d fd ff ff f3 90 80 3f 00 7e f9 e9 4a fd
console shuts up ...
Badness in do_unblank_screen at drivers/char/vt.c:2857

Call Trace: <NMI> <ffffffff80279d79>{do_unblank_screen+73}
<ffffffff8011f76c>{bust_spinlocks+28}
       <ffffffff8010f7e5>{oops_end+21} <ffffffff8010f9c1>{die_nmi+113}
       <ffffffff801199a2>{nmi_watchdog_tick+242}
<ffffffff801100b4>{default_do_nmi+132}
       <ffffffff80119a95>{do_nmi+69} <ffffffff8010ee3f>{nmi+127}
       <ffffffff803411b9>{.text.lock.spinlock+2}  <EOE>  <#MC>
<ffffffff8010f796>{oops_begin+54}
       <ffffffff8011520a>{mce_panic+42} <ffffffff801156e0>{do_machine_check+1072}
       <ffffffff8010f117>{machine_check+127} <ffffffff80227352>{__bitmap_weight+18}
        <EOE>  <IRQ> <ffffffff8012f854>{scheduler_tick+660}
       <ffffffff8013eca4>{update_process_times+260}
<ffffffff80119329>{smp_apic_timer_interrupt+57}
       <ffffffff8010e96c>{apic_timer_interrupt+132}  <EOI>
 <7>APIC error on CPU0: 00(08)
NMI Watchdog detected LOCKUP on CPU 2
CPU 2
Modules linked in: ipv6 autofs4 pcmcia firmware_class yenta_socket
rsrc_nonstatic pcmcia_core binfmt_misc dm_mod af_packet tsdev usbhid video
thermal processor fan container button battery ac ohci_hcd usbcore i2c_amd756
i2c_core e100 mii e1000 floppy amd74xx ide_generic ext3 jbd ide_disk ide_core
3w_xxxx sd_mod scsi_mod
Pid: 30026, comm: l502.exe Tainted: G   M  2.6.14-rc3-smp #2
RIP: 0010:[<ffffffff803411b9>] <ffffffff803411b9>{.text.lock.spinlock+2}
RSP: 0000:ffff81073ff84dd0  EFLAGS: 00000046
RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000012
RDX: 0000000000000002 RSI: ffff81073ff84e78 RDI: ffffffff80406bac
RBP: 000000000000001f R08: b200000000070f0f R09: ffff81073ff84ec8
R10: 0000000000048a86 R11: 0000000000048a84 R12: ffff81073ff84e78
R13: ffffffff80408780 R14: 00010311caf813cb R15: ffffffff8035de08
FS:  0000000041014960(0063) GS:ffffffff804fc900(0000) knlGS:000000005555d660
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00002aaabc899450 CR3: 000000042dd1c000 CR4: 00000000000006e0
Process l502.exe (pid: 30026, threadinfo ffff8105ee76a000, task ffff8105d2c71080)
Stack: ffffffff8010f796 02fa8318538bffff 0000000000000046 0000000000000000
       ffffffff8011520a 9066906666087b8b 0000000000000005 0000000000000014
       ffff81073ff84f58 0000000000000415
Call Trace: <#MC> <ffffffff8010f796>{oops_begin+54} <ffffffff8011520a>{mce_panic+42}
       <ffffffff801156e0>{do_machine_check+1072}
<ffffffff8010f117>{machine_check+127}
        <EOE>

Code: 80 3f 00 7e f9 e9 1d fd ff ff f3 90 80 3f 00 7e f9 e9 4a fd
console shuts up ...
 NMI Watchdog detected LOCKUP on CPU 6
CPU 6
Modules linked in: ipv6 autofs4 pcmcia firmware_class yenta_socket
rsrc_nonstatic pcmcia_core binfmt_misc dm_mod af_packet tsdev usbhid video
thermal processor fan container button battery ac ohci_hcd usbcore i2c_amd756
i2c_core e100 mii e1000 floppy amd74xx ide_generic ext3 jbd ide_disk ide_core
3w_xxxx sd_mod scsi_mod
Pid: 30028, comm: l502.exe Tainted: G   M  2.6.14-rc3-smp #2
RIP: 0010:[<ffffffff803411b9>] <ffffffff803411b9>{.text.lock.spinlock+2}
RSP: 0000:ffff81073ff61dd0  EFLAGS: 00000086
RAX: 0000000000000000 RBX: 0000000000000006 RCX: 0000000000000016
RDX: 0000000000000006 RSI: ffff81073ff61e78 RDI: ffffffff80406bac
RBP: 000000000000001f R08: b200000000070f0f R09: ffff81073ff61ec8
R10: 000000000001ef24 R11: 000000000001ef22 R12: ffff81073ff61e78
R13: ffffffff80408780 R14: 00010311caf83dc8 R15: ffffffff8035de08
FS:  0000000041816960(0063) GS:ffffffff804fcb00(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00002aaac0460760 CR3: 000000042dd1c000 CR4: 00000000000006e0
Process l502.exe (pid: 30028, threadinfo ffff8105ee76e000, task ffff81057efa5520)
Stack: ffffffff8010f796 0000000000000000 0000000000000046 0000000000000000
       ffffffff8011520a 0000000000000000 0000000000000005 0000000000000014
       ffff81073ff61f58 0000000000000415
Call Trace: <#MC> <ffffffff8010f796>{oops_begin+54} <ffffffff8011520a>{mce_panic+42}
       <ffffffff801156e0>{do_machine_check+1072}
<ffffffff8010f117>{machine_check+127}
        <EOE>

Code: 80 3f 00 7e f9 e9 1d fd ff ff f3 90 80 3f 00 7e f9 e9 4a fd
console shuts up ...
 NMI Watchdog detected LOCKUP on CPU 8
CPU 8
Modules linked in: ipv6 autofs4 pcmcia firmware_class yenta_socket
rsrc_nonstatic pcmcia_core binfmt_misc dm_mod af_packet tsdev usbhid video
thermal processor fan container button battery ac ohci_hcd usbcore i2c_amd756
i2c_core e100 mii e1000 floppy amd74xx ide_generic ext3 jbd ide_disk ide_core
3w_xxxx sd_mod scsi_mod
Pid: 30037, comm: l502.exe Tainted: G   M  2.6.14-rc3-smp #2
RIP: 0010:[<ffffffff803411b9>] <ffffffff803411b9>{.text.lock.spinlock+2}
RSP: 0000:ffff8106438a1dd0  EFLAGS: 00000086
RAX: 0000000000000000 RBX: 0000000000000008 RCX: 0000000000000018
RDX: 0000000000000008 RSI: ffff8106438a1e78 RDI: ffffffff80406bac
RBP: 000000000000001f R08: b200000000070f0f R09: ffff8106438a1ec8
R10: 000000000004c794 R11: 000000000004c792 R12: ffff8106438a1e78
R13: ffffffff80408780 R14: 00010311caf82881 R15: ffffffff8035de08
FS:  0000000043c1f960(0063) GS:ffffffff804fcc00(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000006d3df0 CR3: 000000042dd1c000 CR4: 00000000000006e0
Process l502.exe (pid: 30037, threadinfo ffff8106289a4000, task ffff8105d6112820)
Stack: ffffffff8010f796 0000000000000000 0000000000000046 0000000000000000
       ffffffff8011520a 0000000000000000 0000000000000005 0000000000000014
       ffff8106438a1f58 0000000000000415
Call Trace: <#MC> <ffffffff8010f796>{oops_begin+54} <ffffffff8011520a>{mce_panic+42}
       <ffffffff801156e0>{do_machine_check+1072}
<ffffffff8010f117>{machine_check+127}
        <EOE>

Code: 80 3f 00 7e f9 e9 1d fd ff ff f3 90 80 3f 00 7e f9 e9 4a fd
console shuts up ...
 NMI Watchdog detected LOCKUP on CPU 10
CPU 10
Modules linked in: ipv6 autofs4 pcmcia firmware_class yenta_socket
rsrc_nonstatic pcmcia_core binfmt_misc dm_mod af_packet tsdev usbhid video
thermal processor fan container button battery ac ohci_hcd usbcore i2c_amd756
i2c_core e100 mii e1000 floppy amd74xx ide_generic ext3 jbd ide_disk ide_core
3w_xxxx sd_mod scsi_mod
Pid: 30027, comm: l502.exe Tainted: G   M  2.6.14-rc3-smp #2
RIP: 0010:[<ffffffff803411b9>] <ffffffff803411b9>{.text.lock.spinlock+2}
RSP: 0000:ffff8106438e1dd0  EFLAGS: 00000046
RAX: 0000000000000000 RBX: 000000000000000a RCX: 000000000000001a
RDX: 000000000000000a RSI: ffff8106438e1e78 RDI: ffffffff80406bac
RBP: 000000000000001f R08: b200000000070f0f R09: ffff8106438e1ec8
R10: 0000000000000001 R11: 0000000000000005 R12: ffff8106438e1e78
R13: ffffffff80408780 R14: 00010311caf803a5 R15: ffffffff8035de08
FS:  0000000041415960(0063) GS:ffffffff804fcd00(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00002aaabe67c1a0 CR3: 000000042dd1c000 CR4: 00000000000006e0
Process l502.exe (pid: 30027, threadinfo ffff8105ee76c000, task ffff8105d2c636e0)
Stack: ffffffff8010f796 9066906666ffffff 0000000000000046 0000000000000000
       ffffffff8011520a da894837348d49db 0000000000000005 0000000000000014
       ffff8106438e1f58 0000000000000415
Call Trace: <#MC> <ffffffff8010f796>{oops_begin+54} <ffffffff8011520a>{mce_panic+42}
       <ffffffff801156e0>{do_machine_check+1072}
<ffffffff8010f117>{machine_check+127}
       <ffffffff8010e8e8>{apic_timer_interrupt+0}  <EOE>

Code: 80 3f 00 7e f9 e9 1d fd ff ff f3 90 80 3f 00 7e f9 e9 4a fd
console shuts up ...
 NMI Watchdog detected LOCKUP on CPU 14
CPU 14
Modules linked in: ipv6 autofs4 pcmcia firmware_class yenta_socket
rsrc_nonstatic pcmcia_core binfmt_misc dm_mod af_packet tsdev usbhid video
thermal processor fan container button battery ac ohci_hcd usbcore i2c_amd756
i2c_core e100 mii e1000 floppy amd74xx ide_generic ext3 jbd ide_disk ide_core
3w_xxxx sd_mod scsi_mod
Pid: 30024, comm: l502.exe Tainted: G   M  2.6.14-rc3-smp #2
RIP: 0010:[<ffffffff803411b9>] <ffffffff803411b9>{.text.lock.spinlock+2}
RSP: 0000:ffff81073fe61dd0  EFLAGS: 00000086
RAX: 0000000000000000 RBX: 000000000000000e RCX: 000000000000001e
RDX: 000000000000000e RSI: ffff81073fe61e78 RDI: ffffffff80406bac
RBP: 000000000000001f R08: b200000000070f0f R09: ffff81073fe61ec8
R10: 0000000000035bd4 R11: 0000000000035bd2 R12: ffff81073fe61e78
R13: ffffffff80408780 R14: 00010311caf8445b R15: ffffffff8035de08
FS:  0000000040812960(0063) GS:ffffffff804fcf00(0000) knlGS:000000005555d660
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000055555000 CR3: 000000042dd1c000 CR4: 00000000000006e0
Process l502.exe (pid: 30024, threadinfo ffff810553d60000, task ffff81057efa87a0)
Stack: ffffffff8010f796 03be00000001bac9 0000000000000046 0000000000000000
       ffffffff8011520a fd058b48157f7024 0000000000000005 0000000000000014
       ffff81073fe61f58 0000000000000415
Call Trace: <#MC> <ffffffff8010f796>{oops_begin+54} <ffffffff8011520a>{mce_panic+42}
       <ffffffff801156e0>{do_machine_check+1072}
<ffffffff8010f117>{machine_check+127}
        <EOE>

Code: 80 3f 00 7e f9 e9 1d fd ff ff f3 90 80 3f 00 7e f9 e9 4a fd
console shuts up ...
APIC error on CPU0: 08(08)
APIC error on CPU0: 08(08)
APIC error on CPU0: 08(08)
APIC error on CPU0: 08(08)
APIC error on CPU0: 08(08)
APIC error on CPU0: 08(08)
APIC error on CPU0: 08(08)
APIC error on CPU0: 08(08)
APIC error on CPU0: 08(08)
APIC error on CPU0: 08(08)
APIC error on CPU0: 08(08)
{previous line repeated many times}
Comment 2 Mark Williamson 2005-10-05 05:26:45 UTC
Also occurring with 2.6.13.3:
=============================

{generic dmesg....blah}
EXT3-fs: mounted filesystem with ordered data mode.
ip_tables: (C) 2000-2002 Netfilter core team

CPU 28: Machine Check Exception:                4 Bank 4: b200000000070f0f
TSC 5aeddffdae9 

CPU 24: Machine Check Exception:                4 Bank 4: b200000000070f0f
TSC 5aeddffe42a 

CPU 18: Machine Check Exception:                4 Bank 4: b200000000070f0f
TSC 5aede0012fe 

CPU 30: Machine Check Exception:                4 Bank 4: b200000000070f0f
TSC 5aede00153c 

CPU 26: Machine Check Exception:                4 Bank 4: b200000000070f0f
TSC 5aede0020ee 

CPU 20: Machine Check Exception:                4 Bank 4: b200000000070f0f
TSC 5aede003b51 

CPU 22: Machine Check Exception:                4 Bank 4: b200000000070f0f
TSC 5aede0050fd 

CPU 16: Machine Check Exception:                4 Bank 4: b200000000070f0f
TSC 5aede00414e 
Kernel panic - not syncing: Machine check
 NMI Watchdog detected LOCKUP on CPU12CPU 12 
Modules linked in: ipv6 autofs4 af_packet pcmcia firmware_class yenta_socket
rsrc_nonstatic pcmcia_core binfmt_misc dm_mod tsdev evdev usbhid video thermal
processor fan container button battery ac ohci_hcd usbcore i2c_amd756 i2c_core
e100 mii e1000 floppy amd74xx ide_generic ext3 jbd ide_disk ide_core 3w_xxxx
sd_mod scsi_mod
Pid: 6743, comm: l502.exe Tainted: G   M  2.6.13.3-smp
RIP: 0010:[<ffffffff80118b34>] <ffffffff80118b34>{__smp_call_function+116}
RSP: 0000:ffff81073ff79c98  EFLAGS: 00000097
RAX: 0000000000000008 RBX: 000000000000000f RCX: 0000000000000000
RDX: 0000000000000010 RSI: 0000000000000010 RDI: 000000000000efff
RBP: 0000000000000000 R08: 00000000000000fa R09: 0000000000000010
R10: 0000000000000000 R11: ffffffff8011b5a0 R12: ffffffff80118c30
R13: 0000000000000000 R14: 000005aeddffd689 R15: ffffffff80354e48
FS:  0000000041014960(0063) GS:ffffffff804f0e00(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00002aab44c4dd78 CR3: 000000043bccb000 CR4: 00000000000006e0
Process l502.exe (pid: 6743, threadinfo ffff81033f8d0000, task ffff810749b72e70)
Stack: ffffffff80118c30 0000000000000000 6f63796500000008 3d20353100000000 
       0000000000000400 0000000000000000 0000000000000000 0000000000000000 
       ffffffff803fd4c0 ffffffff80118c90 
Call Trace: <#MC> <ffffffff80118c30>{smp_really_stop_cpu+0}
<ffffffff80118c90>{smp_send_stop+64}
       <ffffffff80134f62>{panic+210} <ffffffff801152b8>{print_mce+136}
       <ffffffff801153a6>{mce_panic+166} <ffffffff80115810>{do_machine_check+1088}
       <ffffffff8010ee4b>{machine_check+127}  <EOE> 

Code: 39 d8 74 08 f3 90 eb f4 66 66 66 90 85 ed 74 0e 8b 44 24 14 
console shuts up ...
 NMI Watchdog detected LOCKUP on CPU0<0>Kernel panic - not syncing: Aiee,
killing interrupt handler!
 CPU 0 
Modules linked in: ipv6 autofs4 af_packet pcmcia firmware_class yenta_socket
rsrc_nonstatic pcmcia_core binfmt_misc dm_mod tsdev evdev usbhid video thermal
processor fan container button battery ac ohci_hcd usbcore i2c_amd756 i2c_core
e100 mii e1000 floppy amd74xx ide_generic ext3 jbd ide_disk ide_core 3w_xxxx
sd_mod scsi_mod
Pid: 6745, comm: l502.exe Tainted: G   M  2.6.13.3-smp
RIP: 0010:[<ffffffff8033b2e9>] <ffffffff8033b2e9>{.text.lock.spinlock+2}
RSP: 0000:ffffffff80470ed0  EFLAGS: 00000046
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000010
RDX: 0000000000000000 RSI: ffffffff80470f78 RDI: ffffffff803fb10c
RBP: 000000000000001f R08: 0000000000000005 R09: ffffffff80470fc8
R10: 0000000000028263 R11: 0000000000028261 R12: ffffffff80470f78
R13: ffffffff803fcbc0 R14: 000005aede0021c5 R15: ffffffff80354e48
FS:  0000000041816960(0063) GS:ffffffff804f0800(0000) knlGS:000000005555d660
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000055555000 CR3: 000000043bccb000 CR4: 00000000000006e0
Process l502.exe (pid: 6745, threadinfo ffff81033fbda000, task ffff810343ba2fb0)
Stack: ffffffff8010f4cd 0000000000000000 ffffffff8011532a 0000000000000000 
       0000000000000005 0000000000000014 ffffffff80471058 0000000000000415 
       0000000000000000 0000000000000001 
Call Trace: <#MC> <ffffffff8010f4cd>{oops_begin+45} <ffffffff8011532a>{mce_panic+42}
       <ffffffff80115810>{do_machine_check+1088}
<ffffffff8010ee4b>{machine_check+127}
        <EOE> 

Code: 80 3f 00 7e f9 e9 1d fd ff ff f3 90 80 3f 00 7e f9 e9 4a fd 
console shuts up ...
Badness in do_unblank_screen at drivers/char/vt.c:2822

Call Trace: <NMI> <ffffffff80276369>{do_unblank_screen+73}
<ffffffff8011f6ac>{bust_spinlocks+28}
       <ffffffff8010f515>{oops_end+21} <ffffffff8010f6d7>{die_nmi+103}
       <ffffffff80119a82>{nmi_watchdog_tick+242}
<ffffffff80110094>{default_do_nmi+132}
       <ffffffff80119b69>{do_nmi+73} <ffffffff8010eb53>{nmi+127}
       <ffffffff8033b2e9>{.text.lock.spinlock+2}  <EOE>  <#MC>
<ffffffff8010f4cd>{oops_begin+45}
       <ffffffff8011532a>{mce_panic+42} <ffffffff80115810>{do_machine_check+1088}
       <ffffffff8010ee4b>{machine_check+127}  <EOE> 
 NMI Watchdog detected LOCKUP on CPU4CPU 4 
Modules linked in: ipv6 autofs4 af_packet pcmcia firmware_class yenta_socket
rsrc_nonstatic pcmcia_core binfmt_misc dm_mod tsdev evdev usbhid video thermal
processor fan container button battery ac ohci_hcd usbcore i2c_amd756 i2c_core
e100 mii e1000 floppy amd74xx ide_generic ext3 jbd ide_disk ide_core 3w_xxxx
sd_mod scsi_mod
Pid: 6747, comm: l502.exe Tainted: G   M  2.6.13.3-smp
RIP: 0010:[<ffffffff8033b2e9>] <ffffffff8033b2e9>{.text.lock.spinlock+2}
RSP: 0000:ffff810243865dd0  EFLAGS: 00000046
RAX: 0000000000000000 RBX: 0000000000000004 RCX: 0000000000000014
RDX: 0000000000000004 RSI: ffff810243865e78 RDI: ffffffff803fb10c
RBP: 000000000000001f R08: 0000000000000005 R09: ffff810243865ec8
R10: 0000000000024531 R11: 000000000002452f R12: ffff810243865e78
R13: ffffffff803fcbc0 R14: 000005aede0023a6 R15: ffffffff80354e48
FS:  0000000042018960(0063) GS:ffffffff804f0a00(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00002aaac5a25d08 CR3: 000000043bccb000 CR4: 00000000000006e0
Process l502.exe (pid: 6747, threadinfo ffff81033fbde000, task ffff81023fbca850)
Stack: ffffffff8010f4cd 0000000000000000 ffffffff8011532a a0ab894866780000 
       0000000000000005 0000000000000014 ffff810243865f58 0000000000000415 
       0000000000000000 0000000000000001 
Call Trace: <#MC> <ffffffff8010f4cd>{oops_begin+45} <ffffffff8011532a>{mce_panic+42}
       <ffffffff80115810>{do_machine_check+1088}
<ffffffff8010ee4b>{machine_check+127}
        <EOE> 

Code: 80 3f 00 7e f9 e9 1d fd ff ff f3 90 80 3f 00 7e f9 e9 4a fd 
console shuts up ...
 NMI Watchdog detected LOCKUP on CPU2CPU 2 
Modules linked in: ipv6 autofs4 af_packet pcmcia firmware_class yenta_socket
rsrc_nonstatic pcmcia_core binfmt_misc dm_mod tsdev evdev usbhid video thermal
processor fan container button battery ac ohci_hcd usbcore i2c_amd756 i2c_core
e100 mii e1000 floppy amd74xx ide_generic ext3 jbd ide_disk ide_core 3w_xxxx
sd_mod scsi_mod
Pid: 6746, comm: l502.exe Tainted: G   M  2.6.13.3-smp
RIP: 0010:[<ffffffff8033b2e9>] <ffffffff8033b2e9>{.text.lock.spinlock+2}
RSP: 0000:ffff81013ffdfdd0  EFLAGS: 00000086
RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000012
RDX: 0000000000000002 RSI: ffff81013ffdfe78 RDI: ffffffff803fb10c
RBP: 000000000000001f R08: 0000000000000005 R09: ffff81013ffdfec8
R10: 00000000000183a3 R11: 00000000000183a1 R12: ffff81013ffdfe78
R13: ffffffff803fcbc0 R14: 000005aede000ccc R15: ffffffff80354e48
FS:  0000000041c17960(0063) GS:ffffffff804f0900(0000) knlGS:000000005555d660
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00002aaaaaaac000 CR3: 000000043bccb000 CR4: 00000000000006e0
Process l502.exe (pid: 6746, threadinfo ffff81033fbdc000, task ffff81023fbca170)
Stack: ffffffff8010f4cd 0000000000000000 ffffffff8011532a 006000180077000a 
       0000000000000005 0000000000000014 ffff81013ffdff58 0000000000000415 
       0000000000000000 0000000000000001 
Call Trace: <#MC> <ffffffff8010f4cd>{oops_begin+45} <ffffffff8011532a>{mce_panic+42}
       <ffffffff80115810>{do_machine_check+1088}
<ffffffff8010ee4b>{machine_check+127}
        <EOE> 

Code: 80 3f 00 7e f9 e9 1d fd ff ff f3 90 80 3f 00 7e f9 e9 4a fd 
console shuts up ...
 NMI Watchdog detected LOCKUP on CPU6CPU 6 
Modules linked in: ipv6 autofs4 af_packet pcmcia firmware_class yenta_socket
rsrc_nonstatic pcmcia_core binfmt_misc dm_mod tsdev evdev usbhid video thermal
processor fan container button battery ac ohci_hcd usbcore i2c_amd756 i2c_core
e100 mii e1000 floppy amd74xx ide_generic ext3 jbd ide_disk ide_core 3w_xxxx
sd_mod scsi_mod
Pid: 6618, comm: l502.exe Tainted: G   M  2.6.13.3-smp
RIP: 0010:[<ffffffff8033b2e9>] <ffffffff8033b2e9>{.text.lock.spinlock+2}
RSP: 0000:ffff81063ffb9dd0  EFLAGS: 00000046
RAX: 0000000000000000 RBX: 0000000000000006 RCX: 0000000000000016
RDX: 0000000000000006 RSI: ffff81063ffb9e78 RDI: ffffffff803fb10c
RBP: 000000000000001f R08: 0000000000000005 R09: ffff81063ffb9ec8
R10: 000000000002946f R11: 000000000002946d R12: ffff81063ffb9e78
R13: ffffffff803fcbc0 R14: 000005aede002f1e R15: ffffffff80354e48
FS:  00002aaab1db9ee0(0000) GS:ffffffff804f0b00(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00002aabc3e68648 CR3: 000000043bccb000 CR4: 00000000000006e0
Process l502.exe (pid: 6618, threadinfo ffff81043efa6000, task ffff81023fd53810)
Stack: ffffffff8010f4cd 0000000000000000 ffffffff8011532a 3331203136303032 
       0000000000000005 0000000000000014 ffff81063ffb9f58 0000000000000415 
       0000000000000000 0000000000000001 
Call Trace: <#MC> <ffffffff8010f4cd>{oops_begin+45} <ffffffff8011532a>{mce_panic+42}
       <ffffffff80115810>{do_machine_check+1088}
<ffffffff8010ee4b>{machine_check+127}
        <EOE> 

Code: 80 3f 00 7e f9 e9 1d fd ff ff f3 90 80 3f 00 7e f9 e9 4a fd 
console shuts up ...
 NMI Watchdog detected LOCKUP on CPU8CPU 8 
Modules linked in: ipv6 autofs4 af_packet pcmcia firmware_class yenta_socket
rsrc_nonstatic pcmcia_core binfmt_misc dm_mod tsdev evdev usbhid video thermal
processor fan container button battery ac ohci_hcd usbcore i2c_amd756 i2c_core
e100 mii e1000 floppy amd74xx ide_generic ext3 jbd ide_disk ide_core 3w_xxxx
sd_mod scsi_mod
Pid: 6741, comm: l502.exe Tainted: G   M  2.6.13.3-smp
RIP: 0010:[<ffffffff8033b2e9>] <ffffffff8033b2e9>{.text.lock.spinlock+2}
RSP: 0000:ffff81083ff5ddd0  EFLAGS: 00000046
RAX: 0000000000000000 RBX: 0000000000000008 RCX: 0000000000000018
RDX: 0000000000000008 RSI: ffff81083ff5de78 RDI: ffffffff803fb10c
RBP: 000000000000001f R08: 0000000000000005 R09: ffff81083ff5dec8
R10: 000000000001fb03 R11: 000000000001fb01 R12: ffff81083ff5de78
R13: ffffffff803fcbc0 R14: 000005aeddffdec5 R15: ffffffff80354e48
FS:  0000000040812960(0063) GS:ffffffff804f0c00(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00002aab053366f8 CR3: 000000043bccb000 CR4: 00000000000006e0
Process l502.exe (pid: 6741, threadinfo ffff81033f95c000, task ffff810749b1f750)
Stack: ffffffff8010f4cd 0000000000000000 ffffffff8011532a 0000000000000000 
       0000000000000005 0000000000000014 ffff81083ff5df58 0000000000000415 
       0000000000000000 0000000000000001 
Call Trace: <#MC> <ffffffff8010f4cd>{oops_begin+45} <ffffffff8011532a>{mce_panic+42}
       <ffffffff80115810>{do_machine_check+1088}
<ffffffff8010ee4b>{machine_check+127}
        <EOE> 

Code: 80 3f 00 7e f9 e9 1d fd ff ff f3 90 80 3f 00 7e f9 e9 4a fd 
console shuts up ...
 NMI Watchdog detected LOCKUP on CPU10CPU 10 
Modules linked in: ipv6 autofs4 af_packet pcmcia firmware_class yenta_socket
rsrc_nonstatic pcmcia_core binfmt_misc dm_mod tsdev evdev usbhid video thermal
processor fan container button battery ac ohci_hcd usbcore i2c_amd756 i2c_core
e100 mii e1000 floppy amd74xx ide_generic ext3 jbd ide_disk ide_core 3w_xxxx
sd_mod scsi_mod
Pid: 6742, comm: l502.exe Tainted: G   M  2.6.13.3-smp
RIP: 0010:[<ffffffff8033b2e9>] <ffffffff8033b2e9>{.text.lock.spinlock+2}
RSP: 0000:ffff81043ff2bdd0  EFLAGS: 00000086
RAX: 0000000000000000 RBX: 000000000000000a RCX: 000000000000001a
RDX: 000000000000000a RSI: ffff81043ff2be78 RDI: ffffffff803fb10c
RBP: 000000000000001f R08: 0000000000000005 R09: ffff81043ff2bec8
R10: 0000000000011655 R11: 0000000000011653 R12: ffff81043ff2be78
R13: ffffffff803fcbc0 R14: 000005aede0011ed R15: ffffffff80354e48
FS:  0000000040c13960(0063) GS:ffffffff804f0d00(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00002aab24fbec30 CR3: 000000043bccb000 CR4: 00000000000006e0
Process l502.exe (pid: 6742, threadinfo ffff81033f95e000, task ffff810749b72790)
Stack: ffffffff8010f4cd 0000000000000000 ffffffff8011532a 90666666c35b0014 
       0000000000000005 0000000000000014 ffff81043ff2bf58 0000000000000415 
       0000000000000000 0000000000000001 
Call Trace: <#MC> <ffffffff8010f4cd>{oops_begin+45} <ffffffff8011532a>{mce_panic+42}
       <ffffffff80115810>{do_machine_check+1088}
<ffffffff8010ee4b>{machine_check+127}
        <EOE> 

Code: 80 3f 00 7e f9 e9 1d fd ff ff f3 90 80 3f 00 7e f9 e9 4a fd 
console shuts up ...
 NMI Watchdog detected LOCKUP on CPU14CPU 14 
Modules linked in: ipv6 autofs4 af_packet pcmcia firmware_class yenta_socket
rsrc_nonstatic pcmcia_core binfmt_misc dm_mod tsdev evdev usbhid video thermal
processor fan container button battery ac ohci_hcd usbcore i2c_amd756 i2c_core
e100 mii e1000 floppy amd74xx ide_generic ext3 jbd ide_disk ide_core 3w_xxxx
sd_mod scsi_mod
Pid: 6744, comm: l502.exe Tainted: G   M  2.6.13.3-smp
RIP: 0010:[<ffffffff8033b2e9>] <ffffffff8033b2e9>{.text.lock.spinlock+2}
RSP: 0000:ffff81014388cdd0  EFLAGS: 00000086
RAX: 0000000000000000 RBX: 000000000000000e RCX: 000000000000001e
RDX: 000000000000000e RSI: ffff81014388ce78 RDI: ffffffff803fb10c
RBP: 000000000000001f R08: 0000000000000005 R09: ffff81014388cec8
R10: 000000000001349b R11: 0000000000013499 R12: ffff81014388ce78
R13: ffffffff803fcbc0 R14: 000005aeddfffe33 R15: ffffffff80354e48
FS:  0000000041415960(0063) GS:ffffffff804f0f00(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00002aab648cf6a0 CR3: 000000043bccb000 CR4: 00000000000006e0
Process l502.exe (pid: 6744, threadinfo ffff81033fbd8000, task ffff810343ba21f0)
Stack: ffffffff8010f4cd 0000000000000000 ffffffff8011532a 00000000000020a8 
       0000000000000005 0000000000000014 ffff81014388cf58 0000000000000415 
       0000000000000000 0000000000000001 
Call Trace: <#MC> <ffffffff8010f4cd>{oops_begin+45} <ffffffff8011532a>{mce_panic+42}
       <ffffffff80115810>{do_machine_check+1088}
<ffffffff8010ee4b>{machine_check+127}
        <EOE> 

Code: 80 3f 00 7e f9 e9 1d fd ff ff f3 90 80 3f 00 7e f9 e9 4a fd 
console shuts up ...
Comment 3 Avuton Olrich 2006-01-19 01:31:12 UTC
I've had this same problem along with many others. I have been told by ac and  
others that this is most likely a hardware error, although I have changed _all_  
hardware, minus my CPU (I can't afford another one). I'm using a Gigabyte board  
and a Athlon 64x2  
  
I have spoken to one person who has experienced this MCE and has sucessfully 
gotten rid of it by changing motherboard, I have tried and it didn't work. 
 
CPU 0: Machine Check Exception:                4 Bank 4: b200000000070f0f  
TSC 525db1c705d   
Kernel panic - not syncing: Machine check  
  
http://lkml.org/lkml/2005/11/19/21  
Comment 4 Natalie Protasevich 2007-08-25 22:55:25 UTC
Mark, are you still having problem, does it exist with newer kernels?
If so, then you probably need to tell more about the chemistry software that you mentioned and whether you have any problems when you don't execute the program.
Thanks.
Comment 5 Mark Williamson 2007-09-02 13:12:03 UTC
I cannot reproduce this bug with recent kernels. More seriously, some suspect ram was found in this machine, hence I am not sure how much this bug report can be trusted now. I think it's probably best to close it and reopen it, if the problem is seen again.

Thanks for all the help
Comment 6 Natalie Protasevich 2007-09-03 21:02:13 UTC
OK, sounds good. Let's close it for now then.
Thanks.

Note You need to log in before you can comment on or make changes to this bug.