Bug 5272 - 100% Reproducable bug on x86_64 SMP Tyan S2892 Thunder K8SE
Summary: 100% Reproducable bug on x86_64 SMP Tyan S2892 Thunder K8SE
Status: REJECTED INSUFFICIENT_DATA
Alias: None
Product: Platform Specific/Hardware
Classification: Unclassified
Component: x86-64 (show other bugs)
Hardware: i386 Linux
: P2 blocking
Assignee: Andi Kleen
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-09-17 09:11 UTC by Vladimir Kangin
Modified: 2006-08-03 05:11 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.12-1.1447_FC4smp
Subsystem:
Regression: ---
Bisected commit-id:


Attachments

Description Vladimir Kangin 2005-09-17 09:11:05 UTC
Most recent kernel where this bug did not occur: I haven't try it but according
docs RHEL3 Update 4 and SLES 9 working well.

Distribution: Fedora

Hardware Environment: Tyan S2892 Thunder K8SE, with 2x AMD Opteron 252, with a
few different configuration of disks and memory etc. 99.9% that problem related
to Motherboard, I did try to replace memory to change SATA HDD to PATA etc.. No
 luck! 

Software Environment: Defaulf Server installation.

Problem Description: After Linux installation no problem were discovered but as
soon as some load generated on the server the Oops happens:

Sep 16 13:15:27 s1-ams kernel: loop: loaded (max 8 devices)
Sep 16 13:15:44 s1-ams kernel: Unable to handle kernel paging request at
0000000000004b20 RIP:
Sep 16 13:15:44 s1-ams kernel: <ffffffff8016a791>{pte_alloc_map+144}
Sep 16 13:15:44 s1-ams kernel: PGD 124f09067 PUD 122041067 PMD 0
Sep 16 13:15:44 s1-ams kernel: Oops: 0000 [1] SMP
Sep 16 13:15:44 s1-ams kernel: CPU 1
Sep 16 13:15:44 s1-ams kernel: Modules linked in: loop md5 ipv6 parport_pc lp
parport autofs4 rfcomm l2cap bluetooth sunrpc pcmcia yenta_socket rsrc_nonstatic
pcmcia_core ipt_REJECT ipt_state ip_conntrack iptable_filter ip_tables video
button battery ac ohci_hcd ehci_hcd i2c_nforce2 i2c_core e100 mii tg3
dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod sata_nv libata 3w_9xxx sd_mod scsi_mod
Sep 16 13:15:44 s1-ams kernel: Pid: 3183, comm: depmod Not tainted
2.6.12-1.1447_FC4smp
Sep 16 13:15:44 s1-ams kernel: RIP: 0010:[<ffffffff8016a791>]
<ffffffff8016a791>{pte_alloc_map+144}
Sep 16 13:15:44 s1-ams kernel: RSP: 0000:ffff810122701db8  EFLAGS: 00010213
Sep 16 13:15:44 s1-ams kernel: RAX: 0000000000000003 RBX: 0000000001800e48 RCX:
0000000000000000
Sep 16 13:15:44 s1-ams kernel: RDX: ffff810120d8e000 RSI: ffff81000000f400 RDI:
ffff81000000e000
Sep 16 13:15:44 s1-ams kernel: RBP: ffff810122041060 R08: 0000000000000282 R09:
0000000000000000
Sep 16 13:15:44 s1-ams kernel: R10: 0000000000000000 R11: 0000000000000001 R12:
ffff81013c535b40
Sep 16 13:15:44 s1-ams kernel: R13: ffff81013c535b40 R14: ffff81013c535bb8 R15:
ffff81013c535b40
Sep 16 13:15:44 s1-ams kernel: FS:  00002aaaaadfc6e0(0000)
GS:ffffffff804e7900(0000) knlGS:0000000000000000
Sep 16 13:15:44 s1-ams kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Sep 16 13:15:44 s1-ams kernel: CR2: 0000000000004b20 CR3: 000000012200f000 CR4:
00000000000006e0
Sep 16 13:15:44 s1-ams kernel: Process depmod (pid: 3183, threadinfo
ffff810122700000, task ffff81013c058130)
Sep 16 13:15:44 s1-ams kernel: Stack: ffff81012200f000 0000000001800e48
ffff81012c471938 ffff81013c535b40
Sep 16 13:15:44 s1-ams kernel:        ffff810122041060 ffffffff8016d8df
ffff81013c535bb8 ffffffff8015e9b2
Sep 16 13:15:44 s1-ams kernel:        0000000100541648 0000000000000292
Sep 16 13:15:44 s1-ams kernel: Call
Trace:<ffffffff8016d8df>{handle_mm_fault+272}
<ffffffff8015e9b2>{generic_file_aio_read+48}
Sep 16 13:15:45 s1-ams kernel:        <ffffffff80122615>{do_page_fault+1164}
<ffffffff8010f25d>{error_exit+0}
Sep 16 13:15:45 s1-ams kernel:
Sep 16 13:15:45 s1-ams kernel:
Sep 16 13:15:45 s1-ams kernel: Code: 48 8b b1 20 4b 00 00 48 b8 ff ff ff 7f ff
ff ff ff 48 39 c2
Sep 16 13:15:45 s1-ams kernel: RIP <ffffffff8016a791>{pte_alloc_map+144} RSP
<ffff810122701db8>
Sep 16 13:15:45 s1-ams kernel: CR2: 0000000000004b20
Sep 16 13:15:45 s1-ams kernel:  <3>Debug: sleeping function called from invalid
context at include/linux/rwsem.h:43
Sep 16 13:15:45 s1-ams kernel:  md5 ipv6 parport_pc lp parport autofs4 rfcomm
l2cap bluetooth sunrpc pcmcia yenta_socket rsrc_nonstatic pcmcia_core ipt_REJECT
ipt_state ip_conntrack iptable_filter ip_tables video button battery ac ohci_hcd
ehci_hcd i2c_nforce2 i2c_core e100 mii tg3 dm_snapshot dm_zero dm_mirror ext3
jbd dm_mod sata_nv libata 3w_9xxx sd_mod scsi_mod
Sep 16 13:15:45 s1-ams kernel: Pid: 3183, comm: depmod Not tainted
2.6.12-1.1447_FC4smp
Sep 16 13:15:45 s1-ams kernel: RIP: 0010:[<ffffffff801339ea>]
<ffffffff801339ea>{mm_release+72}
Sep 16 13:15:45 s1-ams kernel: RSP: 0000:ffff8101226e14d8  EFLAGS: 00010206
Sep 16 13:15:45 s1-ams kernel: RAX: ffff810037d30e00 RBX: ffff81013c058130 RCX:
0000000000000000
Sep 16 13:15:45 s1-ams kernel: RDX: ffff81013c058100 RSI: 0000000000000000 RDI:
00002aaaaaf00ff0
Sep 16 13:15:45 s1-ams kernel: RBP: 0000000000000000 R08: 0000000000000720 R09:
ffff8100000bb982
Sep 16 13:15:45 s1-ams kernel: R10: 0000000000000000 R11: ffffffff80208cfd R12:
0000000000000000
Sep 16 13:15:45 s1-ams kernel: R13: 0000000000000000 R14: 0000000000000009 R15:
0000000000000000
Sep 16 13:15:45 s1-ams kernel: FS:  00002aaaaadfc6e0(0000)
GS:ffffffff804e7900(0000) knlGS:0000000000000000
Sep 16 13:15:45 s1-ams kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Sep 16 13:15:45 s1-ams kernel: CR2: 0000000000000048 CR3: 0000000000101000 CR4:
00000000000006e0
Sep 16 13:15:45 s1-ams kernel: Process depmod (pid: 3183, threadinfo
ffff810122700000, task ffff81013c058130)
Sep 16 13:15:45 s1-ams kernel: Stack: 0000000000000000 0000000000000000
ffff81013c058130 0000000000000000
Sep 16 13:15:45 s1-ams kernel:        0000000000000000 ffffffff80137f72
ffffffff80407464 0000000000000000
Sep 16 13:15:45 s1-ams kernel:        ffff81013c058130 0000000000000000
Sep 16 13:15:46 s1-ams kernel: Call Trace:<ffffffff80137f72>{exit_mm+28}
<ffffffff80138bc1>{do_exit+381}
Sep 16 13:15:46 s1-ams kernel:        <ffffffff8024c13e>{do_unblank_screen+40}
<ffffffff801228d9>{do_page_fault+1872}
Sep 16 13:15:46 s1-ams kernel:        <ffffffff80208dd9>{vgacon_cursor+220}
<ffffffff8024a9dc>{vt_console_print+577}
Sep 16 13:15:46 s1-ams kernel:        <ffffffff8010f25d>{error_exit+0}
<ffffffff80208cfd>{vgacon_cursor+0}
Sep 16 13:15:46 s1-ams kernel:        <ffffffff801339ea>{mm_release+72}
<ffffffff801339c0>{mm_release+30}
Sep 16 13:15:46 s1-ams kernel:        <ffffffff80137f72>{exit_mm+28}
<ffffffff80138bc1>{do_exit+381}
Sep 16 13:15:46 s1-ams kernel:        <ffffffff8024c13e>{do_unblank_screen+40}
<ffffffff801228d9>{do_page_fault+1872}
Sep 16 13:15:46 s1-ams kernel:        <ffffffff80208dd9>{vgacon_cursor+220}
<ffffffff8024a9dc>{vt_console_print+577}
Sep 16 13:15:46 s1-ams kernel:        <ffffffff8010f25d>{error_exit+0}
<ffffffff80208cfd>{vgacon_cursor+0}


Steps to reproduce:
1) Installing default Fedora Core 4 server installation on Tyan S2892 Thunder
K8SE, with 2x AMD Opteron 252

2) After installation any attempt to download something with wget above 500MB
cause that problem. As well yum update stuck with the same Oops at the moment
"Apply Transuctions"

Please note that errors are always the same and one of the two -
kernel: Unable to handle kernel paging request at 0000000000004b20 RIP:
(Cause hard crush in a moment!)
or
kernel: Unable to handle kernel paging request at 0000000000004b30 RIP:
(Cause crush but system still pingable for a period of time!)
Comment 1 Vladimir Kangin 2005-09-17 09:27:41 UTC
I need to mention that the same reproducable Oops with kernel 2.6.11-1.1369_FC4smp

vkangin
Comment 2 Andi Kleen 2005-09-17 11:54:33 UTC
Does it happen with 2.6.14rc1?  
  
  
Comment 3 Vladimir Kangin 2005-09-17 12:02:36 UTC
I didn't try to compile the custom kernel 2.6.14rc1
Do you think that abolutely neccesary to install kernel 2.6.14rc1?
I could try, but it seems any activity cause Oops on the system thus that
wouldn't be an easy task :(

I guess that it possible some default setting in BIOS of TYAN cause such a
problem. Does the "paging request at 0000000000004b20 RIP" say something?

Thanking in advance,
Vladimir Kangin
Comment 4 Vladimir Kangin 2005-09-17 12:10:38 UTC
I've found that kernel 2.6.13 available from development branch of Fedora Core 4
repo. Do you think that we have to try it?

Regards,
Vladimir Kangin
Comment 5 Andi Kleen 2005-09-17 12:25:45 UTC
In this bugzilla we only handle mainline kernels - that is  
currently 2.6.14rc1 
 
If you want to report fedora specific issues for Fedora kernels 
please use the fedora bugzilla. 
  
Comment 6 Vladimir Kangin 2005-09-17 14:08:48 UTC
Fedora bugzilla issue has been added:

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=168605
Comment 7 Adrian Bunk 2006-08-03 05:11:37 UTC
Please reopen this bug if it's still present in kernel 2.6.17.

Note You need to log in before you can comment on or make changes to this bug.