Bug 4851 - x86-64 userspace random segfaults and protection errors
x86-64 userspace random segfaults and protection errors
Status: CLOSED PATCH_ALREADY_AVAILABLE
Product: Memory Management
Classification: Unclassified
Component: Other
i386 Linux
: P2 high
Assigned To: Andrew Morton
:
: 4991 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2005-07-05 15:10 UTC by Bongani Hlope
Modified: 2006-04-22 10:27 UTC (History)
20 users (show)

See Also:
Kernel Version: 2.6.11-mm1 - 2.6.13
Tree: Mainline
Regression: ---


Attachments
2.6.13-rc4 .config file (18.74 KB, text/plain)
2005-07-30 10:50 UTC, Chris Caputo
Details
Patch to arch/x86_64/kernel/traps.c to make it so register/code dump happens. (1.83 KB, patch)
2005-07-30 11:29 UTC, Chris Caputo
Details | Diff
Gentoo forum thread on the same subject. (224 bytes, text/html)
2005-08-02 10:57 UTC, Chris Caputo
Details
Patch to x86-64 traps.c in 2.6.13-rc6 to show maps, regs and RIP bytes. (3.45 KB, patch)
2005-08-10 12:34 UTC, Chris Caputo
Details | Diff
Dump maps of failing program this is againt 2.6.13-rc5 (8.41 KB, patch)
2005-08-10 13:08 UTC, Bongani Hlope
Details | Diff
Patch to x86-64 traps.c in 2.6.13-rc6 to show maps, regs and RIP bytes. (3.43 KB, patch)
2005-08-10 21:56 UTC, Chris Caputo
Details | Diff

Description Bongani Hlope 2005-07-05 15:10:22 UTC
Distribution:  
Mandrake 10.2/Madriva 2005 x86-64 version 
     
Hardware Environment:  
Dual Opteron 244, 2GB Memory     
lspci -v     
00:00.0 Host bridge: VIA Technologies, Inc. VT8385 [K8T800 AGP] Host Bridge     
(rev 01)     
        Subsystem: Micro-Star International Co., Ltd.: Unknown device 1300     
        Flags: bus master, 66Mhz, medium devsel, latency 8     
        Memory at 00000000f0000000 (32-bit, prefetchable) [size=128M]     
        Capabilities: <available only to root>     
     
00:01.0 PCI bridge: VIA Technologies, Inc. VT8237 PCI bridge [K8T800 South]     
(prog-if 00 [Normal decode])     
        Flags: bus master, 66Mhz, medium devsel, latency 0     
        Bus: primary=00, secondary=01, subordinate=01, sec-latency=0     
        Memory behind bridge: f8000000-f9ffffff     
        Prefetchable memory behind bridge: e0000000-efffffff     
        Capabilities: <available only to root>     
     
00:05.0 Multimedia video controller: Brooktree Corporation Bt878 Video Capture     
(rev 11)     
        Flags: bus master, medium devsel, latency 32, IRQ 185     
        Memory at 00000000fa015000 (32-bit, prefetchable) [size=4K]     
        Capabilities: <available only to root>     
     
00:05.1 Multimedia controller: Brooktree Corporation Bt878 Audio Capture (rev     
11)     
        Flags: bus master, medium devsel, latency 32, IRQ 10     
        Memory at 00000000fa016000 (32-bit, prefetchable) [size=4K]     
        Capabilities: <available only to root>     
     
00:08.0 Multimedia audio controller: Creative Labs SB Audigy (rev 04)     
        Subsystem: Creative Labs SB Audigy 2 ZS (SB0350)     
        Flags: bus master, medium devsel, latency 32, IRQ 193     
        I/O ports at 9000 [size=64]     
        Capabilities: <available only to root>     
     
00:08.1 Input device controller: Creative Labs SB Audigy MIDI/Game port (rev     
04)     
        Subsystem: Creative Labs SB Audigy MIDI/Game Port     
        Flags: bus master, medium devsel, latency 32     
        I/O ports at 9400 [size=8]     
        Capabilities: <available only to root>     
     
00:08.2 FireWire (IEEE 1394): Creative Labs SB Audigy FireWire Port (rev 04)     
(prog-if 10 [OHCI])     
        Subsystem: Creative Labs SB Audigy FireWire Port     
        Flags: bus master, medium devsel, latency 32, IRQ 185     
        Memory at 00000000fa014000 (32-bit, non-prefetchable) [size=2K]     
        Memory at 00000000fa010000 (32-bit, non-prefetchable) [size=16K]     
        Capabilities: <available only to root>     
     
00:0b.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5705 Gigabit     
Ethernet (rev 03)     
        Subsystem: Micro-Star International Co., Ltd.: Unknown device 1300     
        Flags: bus master, 66Mhz, medium devsel, latency 32, IRQ 185     
        Memory at 00000000fa000000 (64-bit, non-prefetchable) [size=64K]     
        Capabilities: <available only to root>     
     
00:0f.0 RAID bus controller: VIA Technologies, Inc. VIA VT6420 SATA RAID     
Controller (rev 80)     
        Subsystem: Micro-Star International Co., Ltd.: Unknown device 1300     
        Flags: bus master, medium devsel, latency 32, IRQ 169     
        I/O ports at 9800 [size=8]     
        I/O ports at 9c00 [size=4]     
        I/O ports at a000 [size=8]     
        I/O ports at a400 [size=4]     
        I/O ports at a800 [size=16]     
        I/O ports at ac00 [size=256]     
        Capabilities: <available only to root>     
     
00:0f.1 IDE interface: VIA Technologies, Inc.     
VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06) (prog-if 8a     
[Master SecP PriP])     
        Subsystem: Micro-Star International Co., Ltd.: Unknown device 1300     
        Flags: bus master, medium devsel, latency 32, IRQ 169     
        I/O ports at b000 [size=16]     
        Capabilities: <available only to root>     
     
00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1     
Controller (rev 81) (prog-if 00 [UHCI])     
        Subsystem: Micro-Star International Co., Ltd.: Unknown device 1300     
        Flags: bus master, medium devsel, latency 32, IRQ 177     
        I/O ports at b400 [size=32]     
        Capabilities: <available only to root>     
     
00:10.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1     
Controller (rev 81) (prog-if 00 [UHCI])     
        Subsystem: Micro-Star International Co., Ltd.: Unknown device 1300     
        Flags: bus master, medium devsel, latency 32, IRQ 177     
        I/O ports at b800 [size=32]     
        Capabilities: <available only to root>     
     
00:10.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1     
Controller (rev 81) (prog-if 00 [UHCI])     
        Subsystem: Micro-Star International Co., Ltd.: Unknown device 1300     
        Flags: bus master, medium devsel, latency 32, IRQ 177     
        I/O ports at bc00 [size=32]     
        Capabilities: <available only to root>     
     
00:10.4 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 86) (prog-if 20     
[EHCI])     
        Subsystem: Micro-Star International Co., Ltd.: Unknown device 1300     
        Flags: bus master, medium devsel, latency 32, IRQ 177     
        Memory at 00000000fa017000 (32-bit, non-prefetchable) [size=256]     
        Capabilities: <available only to root>     
     
00:11.0 ISA bridge: VIA Technologies, Inc. VT8237 ISA bridge [KT600/K8T800     
South]     
        Subsystem: Micro-Star International Co., Ltd.: Unknown device 1300     
        Flags: bus master, stepping, medium devsel, latency 0     
        Capabilities: <available only to root>     
     
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]     
HyperTransport Technology Configuration     
        Flags: fast devsel     
        Capabilities: <available only to root>     
     
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]     
Address Map     
        Flags: fast devsel     
     
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM     
Controller     
        Flags: fast devsel     
     
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]     
Miscellaneous Control     
        Flags: fast devsel     
     
00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]     
HyperTransport Technology Configuration     
        Flags: fast devsel     
        Capabilities: <available only to root>     
     
00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]     
Address Map     
        Flags: fast devsel     
     
00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM     
Controller     
        Flags: fast devsel     
     
00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]     
Miscellaneous Control     
        Flags: fast devsel     
     
01:00.0 VGA compatible controller: nVidia Corporation NV36 [GeForce FX 5700LE]     
(rev a1) (prog-if 00 [VGA])     
        Subsystem: Giga-byte Technology: Unknown device 310c     
        Flags: bus master, 66Mhz, medium devsel, latency 32, IRQ 10     
        Memory at 00000000f8000000 (32-bit, non-prefetchable) [size=16M]     
        Memory at 00000000e0000000 (32-bit, prefetchable) [size=256M]     
        Expansion ROM at <unassigned> [disabled] [size=128K]     
        Capabilities: <available only to root>     
     
Software Environment: 
If some fields are empty or look unusual you may have an old version.     
Compare to the current minimal requirements in Documentation/Changes.     
     
Linux bongani64 2.6.12 #21 SMP Sun Jun 26 21:34:38 SAST 2005 x86_64 AMD     
Opteron(tm) Processor 244 unknown GNU/Linux     
     
Gnu C                  3.4.3     
Gnu make               3.80     
binutils               2.15.92.0.2     
util-linux             2.12a     
mount                  2.12a     
module-init-tools      3.0     
e2fsprogs              1.36     
reiserfsprogs          line     
reiser4progs           line     
nfs-utils              1.0.7     
Linux C Library        2.3.4     
Dynamic linker (ldd)   2.3.4     
Procps                 3.2.5     
Net-tools              1.60     
Console-tools          0.2.3     
Sh-utils               5.2.1     
udev                   054     
Modules Loaded         snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq     
snd_pcm_oss snd_mixer_oss snd_emu10k1 snd_rawmidi snd_seq_device     
snd_ac97_codec snd_pcm snd_timer snd_page_alloc snd_util_mem snd_hwdep snd     
soundcore i2c_isa i2c_viapro usbhid eth1394 tg3 ide_cd cdrom ohci1394 ieee1394     
loop nls_iso8859_1 nls_cp437 vfat fat tuner bttv video_buf firmware_class     
i2c_algo_bit v4l2_common btcx_risc tveeprom i2c_core videodev sata_via libata     
scsi_mod ehci_hcd uhci_hcd usbcore video thermal processor fan button ac     
     
     
Problem Description: 
User space application (mostly gcc, g++, collect, sed and grep) randomly   
segfault and cause protection errors. This goes away when I revert the   
randomisation-top-of-stack-randomization patch or when I run the following   
command: echo 0 > /proc/sys/kernel/randomize_va_space  
 
     
Steps to reproduce:  
Building the kernel or KDE using make -j4
Comment 1 Arjan van de Ven 2005-07-07 14:27:22 UTC
To diagnose this at all a /proc/<pid>/maps at the moment of crash is needed.
Can you run one of these commands inside gdb (several times if needed) until one
crashes?
Comment 2 Arjan van de Ven 2005-07-07 14:28:04 UTC
(just as paranoia check, please confirm you are not overclocking your system,
are using the stock unmodified kernels and are not using binary or otherwise
external kernel modules)
Comment 3 Bongani Hlope 2005-07-07 21:54:21 UTC
No the system is not overclocked and these are the default 2.6.11-mm1->2.6.12 
kernels. There are no binary modules used 
Comment 4 Andrew Morton 2005-07-28 23:22:35 UTC
Where do we stand with this one?  Still present in
2.6.13-rc4?

Comment 5 Chris Caputo 2005-07-29 15:13:03 UTC
Not sure if this is useful or not, but I have also seen the problem described
above with 2.6.12.3 while using a single Opteron 275 (dual-core) with no
over-clocking.  Tyan S2892 K8SE motherboard with 8 gigs of RAM.

With limited testing I can say that setting kernel.randomize_va_space to zero
caused the problem to not happen anymore.
Comment 6 Chris Caputo 2005-07-30 03:50:25 UTC
Andrew, I confirm that this bug is still present in 2.6.13-rc4 and also that
"sysctl kernel.randomize_va_space=0" prevents the bug from happening.

My repro method is to compile gdb and nmap at the same time, using an Opteron
275.  Results in:

chmod[17460] general protection rip:404207 rsp:7fffff906750 error:0
chmod[19921] general protection rip:2aaaaaaac274 rsp:7fffff9bda70 error:0
cat[26314] general protection rip:2aaaaaaac274 rsp:7fffffdbced0 error:0
gcc[26444] general protection rip:2aaaaaaac274 rsp:7fffff9bd740 error:0
grep[26929] general protection rip:2aaaaaaac274 rsp:7ffffffbdd10 error:0
cat[1764] general protection rip:2aaaaaaac274 rsp:7fffffbbe900 error:0
Comment 7 Alexander Nyberg 2005-07-30 04:04:32 UTC
Chris, could you put up your .config and say what distribution you use?

Thanks
Comment 8 Chris Caputo 2005-07-30 10:50:54 UTC
Created attachment 5423 [details]
2.6.13-rc4 .config file

Distribution: Gentoo
.config: attached
taint: arcmsr.ko Areca RAID controller patches from 2.6.13-rc3-mm3
Comment 9 Chris Caputo 2005-07-30 11:29:13 UTC
Created attachment 5424 [details]
Patch to arch/x86_64/kernel/traps.c to make it so register/code dump happens.

Here is additional info resulting from the application of this patch:

sed[5635] general protection rip:2aaaaaaac274 rsp:7ffffffbce50 error:0

Modules linked in:
Pid: 5635, comm: sed Not tainted 2.6.13-rc4
RIP: 0033:[<00002aaaaaaac274>] [<00002aaaaaaac274>]
RSP: 002b:00007ffffffbce50  EFLAGS: 00010296
RAX: 93d1007b364ae0c6 RBX: 0000000000000000 RCX: 00002aaaaaaabab0
RDX: 000000596fc3e124 RSI: 93d1007b364ae0c6 RDI: 93d1007b363992ae
RBP: 0000000000000000 R08: 00002aaaaabc03a8 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000200 R12: 0000000000000000
R13: 0000000000000000 R14: 00007ffffffbcec0 R15: 0000000000000000
FS:  00002aaaaaff36d0(0000) GS:ffffffff804d2880(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002aaaaabbfd90 CR3: 0000000234dec000 CR4: 00000000000006e0
 48 8b 00 48 85 c0 74 74 48 89 c2 41 b9 ff ff ff 6f 41 ba 21


x86_64-pc-linux[7506] general protection rip:2aaaaaaac274 rsp:7ffffffbe850
error:0

Modules linked in:
Pid: 7506, comm: x86_64-pc-linux Not tainted 2.6.13-rc4
RIP: 0033:[<00002aaaaaaac274>] [<00002aaaaaaac274>]
RSP: 002b:00007ffffffbe850  EFLAGS: 00010282
RAX: cfcaf86d6672c8d9 RBX: 0000000000000000 RCX: 00002aaaaaaabab0
RDX: 0000005c2b559228 RSI: cfcaf86d6672c8d9 RDI: cfcaf86d66617ac1
RBP: 0000000000000000 R08: 00002aaaaabc03a8 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000200 R12: 0000000000000000
R13: 0000000000000000 R14: 00007ffffffbe8c0 R15: 0000000000000000
FS:  00002aaaaaff36d0(0000) GS:ffffffff804d2880(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002aaaaabbfd90 CR3: 0000000238d93000 CR4: 00000000000006e0
 48 8b 00 48 85 c0 74 74 48 89 c2 41 b9 ff ff ff 6f 41 ba 21


rm[17664] general protection rip:2aaaaaaac274 rsp:7ffffffbdfd0 error:0

Modules linked in:
Pid: 17664, comm: rm Not tainted 2.6.13-rc4
RIP: 0033:[<00002aaaaaaac274>] [<00002aaaaaaac274>]
RSP: 002b:00007ffffffbdfd0  EFLAGS: 00010292
RAX: 8ed0b837358cefd6 RBX: 0000000000000000 RCX: 00002aaaaaaabab0
RDX: 00000063a8cfa4f8 RSI: 8ed0b837358cefd6 RDI: 8ed0b837357ba1be
RBP: 0000000000000000 R08: 00002aaaaabc03a8 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000200 R12: 0000000000000000
R13: 0000000000000000 R14: 00007ffffffbe040 R15: 0000000000000000
FS:  00002aaaaaff36d0(0000) GS:ffffffff804d2880(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002aaaaabbfd90 CR3: 0000000230cc8000 CR4: 00000000000006e0
 48 8b 00 48 85 c0 74 74 48 89 c2 41 b9 ff ff ff 6f 41 ba 21
Comment 10 Bongani Hlope 2005-07-30 12:19:19 UTC
Hi Andrew  
  
I tested 2.6.13-rc4 with kernel.randomize_va_space=1 and the bug is still  
there. I've been trying to implement show_map when application segfault, sorry 
for being quite. I just saw the traps patch, I'll apply and test. 
 
I'm begining to suspect that the problem might be cause by the size of the 
command line that is passed to the applications that segfault. All the 
application that segfault usualy take long command line arguments, maye with 
the randomization this is not calculated correctly. 
Comment 11 Chris Caputo 2005-07-30 12:53:22 UTC
RE: long command lines...  I am not seeing that.  I've seen a command as short
as "rm -f conftest.er1" result in a "general protection" trap.  This was
revealed by enabling core dumps ("sysctl kernel.core_uses_pid=1", "ulimit -c
unlimited") and using gdb to see the command line.
Comment 12 Ulrich Drepper 2005-07-31 11:33:39 UTC
Ingo forwarded some of the mails.  Here's what I replied with.  If somebody can
reproduce it please provide this information.


If this is really the location this is all very curious.  But we can
investigate it.

What happens here is a call to elf_machine_load_address() computes the
load *offset* for the ld.so binary.  Yes, the function is misnamed but
only now that we have prelinking.

For a non-prelinked binary the returned value is in fact the load
address.  For a prelinked value the value is the difference of the
prelink address and the actual load address.

Now, the elf_machine_dynamic() function returns the offset of the
dynamic section.  This the simple offset in case the binary is not
prelinked, or the adjusted address otherwise.

Important to realize  is that load offset + dynamic section offset are
always pointing to the dynamic section.  In the non-relinked case, the
load offset is the "address" and the dynamic section offset is a real
offset, in the prelinked case it is the other way around.

That's at least the theory.


If the crashes are where the reported say they are we should be able to
debug it.  So, please do the following after a crash (debian people need
to adjust for their wrong paths):

1. determine whether the crash is really in _dl_start.  Just run

    readelf -s /lib64/ld-linux-x86-64.so|egrep '_dl_start$'

   This should give output like

 23: 000000354ee01390   1019 FUNC    LOCAL  DEFAULT        9 _dl_start

   This means the function _dl_start starts at 0x354ee01390 and is 1019
   bytes long.


2. If the crash is really in _dl_start and in the loop handling the
   dynamic section we need to look at the code.  Run

     objdump -Sr /lib64/ld-linux-x86-64.so|less

   and search for the definition of _dl_start.  It should look something
   like this (this is a prelinked ld.so):

000000354ee01390 <_dl_start>:
  354ee01390:   55                      push   %rbp
  354ee01391:   48 89 fd                mov    %rdi,%rbp
  354ee01394:   48 83 ec 10             sub    $0x10,%rsp
  354ee01398:   0f 31                   rdtsc
  354ee0139a:   89 d2                   mov    %edx,%edx
  354ee0139c:   89 c0                   mov    %eax,%eax
  354ee0139e:   48 c1 e2 20             shl    $0x20,%rdx
  354ee013a2:   48 09 c2                or     %rax,%rdx
  354ee013a5:   48 8b 05 f4 97 11 00    mov    1153012(%rip),%rax
 # 354ef1aba0 <_rtld_global+0xba0>
  354ee013ac:   48 8d 35 dd ff ff ff    lea    -35(%rip),%rsi        #
354ee01390 <_dl_start>
  354ee013b3:   48 29 c6                sub    %rax,%rsi
  354ee013b6:   48 89 f7                mov    %rsi,%rdi
  354ee013b9:   48 03 3d 18 8c 11 00    add    1149976(%rip),%rdi
 # 354ef19fd8 <_GLOBAL_OFFSET_TABLE_+0x40>

   The important part starts at 354ee013ac.  We need to look at the
   variables referenced.  First there is the little variable in .data.
   So

     objdump -j .data -s /lib64/ld-linux-x86-64.so|less

   and look for address 354ef1aba0 (see the comment at the end of the
   instruction, this is the effective address):

 354ef1aba0 9013e04e 35000000                    ...N5...

   This means the value is 0x354ee01390.  Now look at the _DYNAMIC
   address, which is computed using the GOT (instruction at
   0x354ee013b9, variable at 0x354ef19fd8):

    objdump -j .got -s /lib64/ld-linux-x86-64.so|less

   For me this shows:

 354ef19fd8 189ef14e 35000000 00adf14e 35000000  ...N5......N5...

   I.e., the address of _DYNAMIC is 0x354ef19e18.


   If the load address matches the prelink address, the value in %rsi
   after the sub at address 0x354ee013b3 is zero and zero is added to
   the _DYNAMIC address (which better is correct).

   If the load address does not match prelink address the computed
   dynamic section address is

    real _dl_start address - _dl_start after prelink
        + _DYNAMIC after prelink

   The "wrong" load address in the second and third term cancel each
   other out.


This information should hopefully enable somebody to examine the crash
in more detail.  _dl_start is really almost the first code which is run.
 The _start function, which is the entry point, being with

     a30:       48 89 e7                mov    %rsp,%rdi
     a33:       e8 18 11 00 00          callq  1b50 <_dl_start>
Comment 13 Ulrich Drepper 2005-07-31 12:24:57 UTC
Another thing, has this problem been shown with Intel processors?  Or other
motherboards?  I ask because we have other strange bugs related to dual opterons:

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=160341

This bug is also some kind of memory corruption or address corruption of some
sort since the kernel certainly creates the auxiliary vector even though the
userland code doesn't see this.

To me this smells almost like a hardware bug.
Comment 14 Chris Caputo 2005-07-31 12:34:14 UTC
I'm using a Tyan 2892 mobo with a dual-core Opteron 275.

Bongani, what motherboard are you using with the dual-proc Opteron 244's?

Ulrich, you might be right about it being hardware, but I'd be surprised since
the  problem doesn't show up when kernel.randomize_va_space is set to 0.
Comment 15 Bongani Hlope 2005-07-31 12:47:27 UTC
My Motherboard is a MSI K8T Master2 FAR. But this board survived 8 hours of 
memtest+ before I reported the bug. 
Comment 16 Chris Caputo 2005-07-31 13:01:04 UTC
[apologies for the length of this]

I replaced my /lib64/ld-2.3.5.so which was lacking symbols (was stripped) 
with one with symbols.  Rebooted and reproduced bug again, by compiling 
gdb and nmap at the same time.  Here's a walk-through to summarize what I 
see...

[dmesg snip of one of the faults:]
rm[7790] general protection rip:2aaaaaaac274 rsp:7ffffffbdd20 error:0
Modules linked in:
Pid: 7790, comm: rm Not tainted 2.6.13-rc4
RIP: 0033:[<00002aaaaaaac274>] [<00002aaaaaaac274>]
RSP: 002b:00007ffffffbdd20  EFLAGS: 00010292
RAX: 989efc3b3b47a1d3 RBX: 0000000000000000 RCX: 00002aaaaaaabab0
RDX: 00000093d0db7607 RSI: 989efc3b3b47a1d3 RDI: 989efc3b3b3653bb
RBP: 0000000000000000 R08: 00002aaaaabc03a8 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000200 R12: 0000000000000000
R13: 0000000000000000 R14: 00007ffffffbdd90 R15: 0000000000000000
FS:  00002aaaaaff36d0(0000) GS:ffffffff80538880(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002aaaaabbfd90 CR3: 0000000237755000 CR4: 00000000000006e0
  48 8b 00 48 85 c0 74 74 48 89 c2 41 b9 ff ff ff 6f 41 ba 21 00 00 70 bd 
31 00 00 00 41 bb ff fd ff 6f 41 bc 34 fe ff 6f bb ff fe ff 6f 41 bd 40 ff 
ff 6f eb 17 66 66 66 90 66 66 90

---

# gdb rm core.7790
[...]
Core was generated by `rm -f conftest.er1'.
Program terminated with signal 11, Segmentation fault.
#0  0x00002aaaaaaac274 in ?? ()

(gdb) x/10i $rip
0x2aaaaaaac274: Cannot access memory at address 0x2aaaaaaac274

(gdb) bt
#0  0x00002aaaaaaac274 in ?? ()
#1  0x0000000000000000 in ?? ()
#2  0x0000000000000000 in ?? ()
[...]
#2138 0x006d722f6e69622f in ?? ()
#2139 0x006d722f6e69622f in ?? ()
#2140 0x0000000000000000 in ?? ()
Cannot access memory at address 0x7ffffffc2000

---

Simple test program (x.c).  Bytes match dmesg byte dump above.  Just a 
sanity check to make sure it really is ld.so:

char x[] = {
   0x48,0x8b,0x00,0x48,0x85,0xc0,0x74,0x74,0x48,0x89,0xc2,0x41,0xb9,0xff,
   0xff,0xff,0x6f,0x41,0xba,0x21,0x00,0x00,0x70,0xbd,0x31,0x00,0x00,0x00,
   0x41,0xbb,0xff,0xfd,0xff,0x6f,0x41,0xbc,0x34,0xfe,0xff,0x6f,0xbb,0xff,
   0xfe,0xff,0x6f,0x41,0xbd,0x40,0xff,0xff,0x6f,0xeb,0x17,0x66,0x66,0x66,
   0x90,0x66,0x66,0x90
};

main()
{
   printf("Hello\n");
}

---

gcc -o x -g x.c

gdb x
(gdb) b *0x00002aaaaaaac274
Breakpoint 1 at 0x2aaaaaaac274
(gdb) b main
Breakpoint 2 at 0x4004bc: file x.c, line 12.
(gdb) run
Starting program: /root/x

Breakpoint 1, 0x00002aaaaaaac274 in ?? ()
(gdb) x/10i $rip
0x2aaaaaaac274: mov    (%rax),%rax
0x2aaaaaaac277: test   %rax,%rax
0x2aaaaaaac27a: je     0x2aaaaaaac2f0
0x2aaaaaaac27c: mov    %rax,%rdx
0x2aaaaaaac27f: mov    $0x6fffffff,%r9d
0x2aaaaaaac285: mov    $0x70000021,%r10d
0x2aaaaaaac28b: mov    $0x31,%ebp
0x2aaaaaaac290: mov    $0x6ffffdff,%r11d
0x2aaaaaaac296: mov    $0x6ffffe34,%r12d
0x2aaaaaaac29c: mov    $0x6ffffeff,%ebx
(gdb) x/10i x   [char array from test program with assembly matching]
0x5008a0 <x>:   mov    (%rax),%rax
0x5008a3 <x+3>: test   %rax,%rax
0x5008a6 <x+6>: je     0x50091c
0x5008a8 <x+8>: mov    %rax,%rdx
0x5008ab <x+11>:        mov    $0x6fffffff,%r9d
0x5008b1 <x+17>:        mov    $0x70000021,%r10d
0x5008b7 <x+23>:        mov    $0x31,%ebp
0x5008bc <x+28>:        mov    $0x6ffffdff,%r11d
0x5008c2 <x+34>:        mov    $0x6ffffe34,%r12d
0x5008c8 <x+40>:        mov    $0x6ffffeff,%ebx

While gdb is running "ps" in another shell reveals:

root     18210  0.0  0.0    188    36 pts/1    T    19:14   0:00 /root/x

# cat /proc/18210/maps
00400000-00401000 r-xp 00000000 08:02 536783             /root/x
00500000-00501000 rw-p 00000000 08:02 536783             /root/x
2aaaaaaab000-2aaaaaac0000 r-xp 00000000 08:02 537108     /lib64/ld-2.3.5.so
2aaaaabbf000-2aaaaabc1000 rw-p 00014000 08:02 537108     /lib64/ld-2.3.5.so
7ffffff40000-7ffffff56000 rw-p 7ffffff40000 00:00 0      [stack]
ffffffffff600000-ffffffffffe00000 ---p 00000000 00:00 0  [vdso]

Noted that RIP is in /lib64/ld-2.3.5.so code.

Per Ulrich's suggestions:

# readelf -s /lib64/ld-linux-x86-64.so.2|egrep '_dl_start$'
     23: 0000000000001220  1507 FUNC    LOCAL  DEFAULT    9 _dl_start
# objdump -Sr /lib64/ld-linux-x86-64.so.2|less
[skipped down to _dl_start]
0000000000001220 <_dl_start>:
     1220:       41 56                   push   %r14
     1222:       49 89 fe                mov    %rdi,%r14
     1225:       41 55                   push   %r13
     1227:       41 54                   push   %r12
     1229:       55                      push   %rbp
     122a:       53                      push   %rbx
     122b:       48 83 ec 40             sub    $0x40,%rsp
     122f:       0f 31                   rdtsc
     1231:       48 c1 e2 20             shl    $0x20,%rdx
     1235:       89 c0                   mov    %eax,%eax
     1237:       4c 8d 05 6a 41 11 00    lea    1130858(%rip),%r8
         # 1153a8 <_rtld_global+0x3a8>
     123e:       48 09 c2                or     %rax,%rdx
     1241:       48 8b 05 d8 45 11 00    mov    1131992(%rip),%rax
         # 115820 <_rtld_global+0x820>
     1248:       48 8d 3d d1 ff ff ff    lea    -47(%rip),%rdi
         # 1220 <_dl_start>
     124f:       48 29 c7                sub    %rax,%rdi
     1252:       48 89 f8                mov    %rdi,%rax
     1255:       48 03 05 7c 3d 11 00    add    1129852(%rip),%rax
         # 114fd8 <_GLOBAL_OFFSET_TABLE_+0x40>
     125c:       48 89 3d 05 41 11 00    mov    %rdi,1130757(%rip)
         # 115368 <_rtld_global+0x368>
     1263:       48 89 15 26 3b 11 00    mov    %rdx,1129254(%rip)
         # 114d90 <start_time>
     126a:       48 89 05 07 41 11 00    mov    %rax,1130759(%rip)
         # 115378 <_rtld_global+0x378>
     1271:       48 89 c6                mov    %rax,%rsi
     1274:       48 8b 00                mov    (%rax),%rax
     1277:       48 85 c0                test   %rax,%rax
     127a:       74 74                   je     12f0 <_dl_start+0xd0>
     127c:       48 89 c2                mov    %rax,%rdx
     127f:       41 b9 ff ff ff 6f       mov    $0x6fffffff,%r9d
     1285:       41 ba 21 00 00 70       mov    $0x70000021,%r10d
     128b:       bd 31 00 00 00          mov    $0x31,%ebp
     1290:       41 bb ff fd ff 6f       mov    $0x6ffffdff,%r11d
     1296:       41 bc 34 fe ff 6f       mov    $0x6ffffe34,%r12d
     129c:       bb ff fe ff 6f          mov    $0x6ffffeff,%ebx
     12a1:       41 bd 40 ff ff 6f       mov    $0x6fffff40,%r13d
     12a7:       eb 17                   jmp    12c0 <_dl_start+0xa0>
[...]

# objdump -j .data -s /lib64/ld-linux-x86-64.so.2| grep 115820
  115820 20120000 00000000                     .......
# objdump -j .got -s /lib64/ld-linux-x86-64.so.2| grep 114fd8
  114fd8 184e1100 00000000 00000000 00000000  .N..............
So _DYNAMIC equals 0x114e18.

But since I don't appear to be prelinked I am not sure what we can 
determine from this.

Comment 17 Ulrich Drepper 2005-07-31 13:56:07 UTC
The values as shown in the executable are fine.  You now need to look at the
appropriate memory location in the core file.  The addresses before relocation
are 0x115820 and 0x114fd8.  So yo have to add the load address of the dynamic
linker to get the address at runtime.

The value should be the same as in the executable since the relocation hasn't
happen yet.

You should see the wrong value, content of %rax, as the result.  If not some
magic made the value appear in the register.  You have all the information
available to trace the value which should be in the register.
Comment 18 Chris Caputo 2005-07-31 16:27:29 UTC
I checked the appropriate locations in the core file and did the math.  This
resulted in a sane value for RAX instead of the garbage value.

  Core was generated by `rm -f conftest.er1'.
  Program terminated with signal 11, Segmentation fault.
  #0  0x00002aaaaaaac274 in ?? ()
  (gdb) x/xg 0x2aaaaabc0820
  0x2aaaaabc0820: 0x0000000000001220
  (gdb) x/xg 0x2aaaaabbffd8
  0x2aaaaabbffd8: 0x00000000000114e18

0x2aaaaaaac220 - 0x1220 + 0x114e18 = 0x2AAAAABBFE18 (sane), but...

  (gdb) info registers rax
  rax            0x9d96be7b808c9bc7       -7091271125601313849

Is it possible the page with this data wasn't fully instantiated when then code
ran, but was fully instantiated by the time the core dump happened?  Not sure
how else to explain what is observed.  I could see that kind of a race condition
explaining why this is seen relatively infrequently.

Also, a question.  Why can't I see the ld.so code from the core dump?

  (gdb) x/10 $rip
  0x2aaaaaaac274: Cannot access memory at address 0x2aaaaaaac274
Comment 19 Linus Torvalds 2005-07-31 16:47:57 UTC

On Sun, 31 Jul 2005 bugme-daemon@kernel-bugs.osdl.org wrote:
> 
> Is it possible the page with this data wasn't fully instantiated when then code
> ran, but was fully instantiated by the time the core dump happened?  Not sure
> how else to explain what is observed.

TLB or page table initialization bugs?

> Also, a question.  Why can't I see the ld.so code from the core dump?
> 
>   (gdb) x/10 $rip
>   0x2aaaaaaac274: Cannot access memory at address 0x2aaaaaaac274

Possibly because it's file-backed:

	static int maydump(struct vm_area_struct *vma)
	...
	        /* If it hasn't been written to, don't write it out */
        	if (!vma->anon_vma)
	                return 0;


the core-dump should have a pointer to the file, and gdb should be able to 
read it from there, no?

		Linus

Comment 20 Ingo Molnar 2005-08-01 01:00:22 UTC
almost sounds like a CPU bug. Randomization's effect could be that instead of
one given TLB layout, it pretty much does a complete search of all possible TLB
layouts, in terms of the hashing of virtual addresses used by ld.so. (as far as
the stack pointer goes) So if the bug only happens with a certain virtual memory
layout, randomization will 'spread out' the likelyhood of hitting the bug.

Is there any (strong) correlation between the CPU types used for this?
Comment 21 Ulrich Drepper 2005-08-01 01:13:19 UTC
Even if I include the RH bugzilla bug the reported problems are restricted to
SMP machines with opterons.

Maybe somebody who talks to AMD can get them to try to reproduce the problem.
Comment 22 Ingo Molnar 2005-08-01 13:45:59 UTC
another thing: does the bug only happen with the SMP kernel?
Comment 23 Chris Caputo 2005-08-01 14:12:04 UTC
On Mon, 1 Aug 2005, mingo@elte.hu wrote:
> Is there any (strong) correlation between the CPU types used for this?

I am not sure what you mean.  Right now we've got Bongani with a 
dual-Opteron 244 on an MSI K8T Master2 FAR and me with a single dual-core 
Opteron 275 on a Tyan 2892.

> another thing: does the bug only happen with the SMP kernel?

I just tried a non-SMP kernel and was unable to repro.  Bongani can you 
confirm?

Comment 24 Ingo Molnar 2005-08-01 14:18:28 UTC
could you check one more thing: could you bind your shell to one of the CPUs
(e.g. the first CPU), via "taskset 01 -p $$"? This way all commands started from
that shell will run on CPU#0. Can you reproduce the bug with such a 'serialized'
setup? I.e. does the bug depend on true SMP parallelism?

(taskset is in the schedutils package)
Comment 25 Ingo Molnar 2005-08-01 14:32:27 UTC
please try to do all testing on CPU#0, initially.

but if it's not reproducible bound to a single CPU, feel free to play with other
possibilities - like compiling on CPU#0 from one shell, and doing the other
compile on CPU#1.

we already know the UP kernel does not show the bug, so what might help a bit is
to figure out what type of parallelism is needed to trigger the bug.

you might also want to figure out a simpler reproducer. Does it need a full
kernel compile to get the segfaults - or is it enough to compile two C files in
parallel on two CPUs to trigger the faults, etc.
Comment 26 Chris Caputo 2005-08-01 16:29:11 UTC
On Mon, 1 Aug 2005, mingo@elte.hu wrote:
> could you check one more thing: could you bind your shell to one of the 
> CPUs (e.g. the first CPU), via "taskset 01 -p $$"? This way all commands 
> started from that shell will run on CPU#0. Can you reproduce the bug 
> with such a 'serialized' setup? I.e. does the bug depend on true SMP 
> parallelism?

If both compiles are run on CPU #0 the bug still happens.

If the nmap compile is done exclusively on CPU #0 while the gdb compile is 
done exclusively on CPU #1 the bug still happens.

> you might also want to figure out a simpler reproducer. Does it need a
> full kernel compile to get the segfaults - or is it enough to compile
> two C files in parallel on two CPUs to trigger the faults, etc.

I haven't run into the problem with kernel compiles.  For me the repro 
happens during the "./configure" stage of the nmap/gdb builds using the 
"emerge" function on Gentoo.  The programs which have crashed have been: 
cat, chmod, gcc, grep, mkdir, mv, rm, sed and x86_64-pc-linux.

Interestingly, while trying to find an easier repro, the following silly 
program...

   #include <unistd.h>

   main()
   {
     int i = 1000, pid;

     while (i--)
       {
         pid = fork();
         if (pid == 0)
           execl("/usr/bin/rm", "rm", "-f", "/tmp/x");
         printf(".");
         usleep(10000);
       }
   }

... when run from two shells at the same time occasionally results in the 
Redhat bugzilla #160341 as evidenced by the following being spit out:

   Usage: ld.so [OPTION]... EXECUTABLE-FILE [ARGS-FOR-PROGRAM...]
   You have invoked `ld.so', the helper program for shared library
   executables.
   [...]

There was no fault reported when that happened and this test program was 
the first time I have seen that.  Strangely one shell had it happen over 
and over again while another shell has yet to demonstrate it.  Both shells 
have processor affinity set to 0x03 (both CPUs).

I'm working to identify a better repro...

Comment 27 Ingo Molnar 2005-08-02 02:57:13 UTC
could you boot the SMP kernel but with maxcpus=1? This basically has the effect
of binding all tasks to CPU#0.

if the bug does not occur with maxcpus=1, then it is strange that even if you
bind everything to CPU#0 in the 2-CPU case, the bug still happens. This means
that some other task, which may run on CPU#1, has an impact. That could be
anything from pdflush threads to kswapd threads. (dont try to taskset those
system threads, some of them are already bound and need to run on the right CPU!)

another thing: if you are testing this in X, could you do the testing on a text
console? That would ensure that by binding the shell to CPU#0, all tasks will
run on CPU#0 - while in the X case both X and gnome-terminal would be able to
run on CPU#1.
Comment 28 Chris Caputo 2005-08-02 10:14:14 UTC
On Tue, 2 Aug 2005, mingo@elte.hu wrote:
> could you boot the SMP kernel but with maxcpus=1? This basically has the 
> effect of binding all tasks to CPU#0.
>
> if the bug does not occur with maxcpus=1, then it is strange that even 
> if you bind everything to CPU#0 in the 2-CPU case, the bug still 
> happens. This means that some other task, which may run on CPU#1, has an 
> impact. That could be anything from pdflush threads to kswapd threads. 
> (dont try to taskset those system threads, some of them are already 
> bound and need to run on the right CPU!)

The bug does not happen with "maxcpus=1" on an SMP kernel.

> another thing: if you are testing this in X, could you do the testing on 
> a text console? That would ensure that by binding the shell to CPU#0, 
> all tasks will run on CPU#0 - while in the X case both X and 
> gnome-terminal would be able to run on CPU#1.

X is not installed on this machine.  All of my tests are being done via an 
SSH login.  Would there be any difference between an SSH session and a 
text console login?

By the way, I have simplified the repro a little.  I can now get it to 
happen by running the following program in one shell, while doing just an 
nmap compile in another shell.  No longer is a gdb compile needed.:

   main()
   {
     long i = 1;
     while (i++);
   }

Also, further info I have learned.  The nmap compile under Gentoo (done 
with the "emerge nmap" command) involves compiling in what Gentoo calls a 
"sandbox".  This sandbox sets LD_PRELOAD to "libsandbox.so" which is a 
library which intercepts and logs for later reporting, certain syscalls in 
order to keep the compile from going outside the sandbox.  I'm working to 
find a repro that doesn't involve the nmap compilation or the sandbox, but 
have not yet succeeded.  I am not sure why the sandbox/LD_PRELOAD makes a 
difference, but it seems to.  Of course, this doesn't explain how the 
other reporter is seeing the problem with a "make -j4" kernel compile, 
while I am unable to repro it doing that.

It seems that there is some kind of critical section or data that isn't 
being protected properly, and which gets occasionally trashed when both 
CPUs are non-idle.

Comment 29 Chris Caputo 2005-08-02 10:57:09 UTC
Created attachment 5479 [details]
Gentoo forum thread on the same subject.
Comment 30 Zwane Mwaikambo 2005-08-02 11:37:51 UTC
Has someone managed to reproduce this on an intel system?
Comment 31 Arjan van de Ven 2005-08-02 11:55:42 UTC
On Tue, Aug 02, 2005 at 10:14:20AM -0700, bugme-daemon@kernel-bugs.osdl.org wrote:
> 
> Also, further info I have learned.  The nmap compile under Gentoo (done 
> with the "emerge nmap" command) involves compiling in what Gentoo calls a 
> "sandbox".  This sandbox sets LD_PRELOAD to "libsandbox.so" which is a 
> library which intercepts and logs for later reporting, certain syscalls in 
> order to keep the compile from going outside the sandbox.  I'm working to 
> find a repro that doesn't involve the nmap compilation or the sandbox, but 
> have not yet succeeded.  I am not sure why the sandbox/LD_PRELOAD makes a 
> difference, but it seems to.  Of course, this doesn't explain how the 
> other reporter is seeing the problem with a "make -j4" kernel compile, 
> while I am unable to repro it doing that.
> 

actually libsandbox.so is broken:

--- libsandbox.c~	2005-08-02 18:54:52.000000000 -0400
+++ libsandbox.c	2005-08-02 18:54:52.000000000 -0400
@@ -677,7 +677,7 @@ fopen64(const char *pathname, const char
 	if FUNCTION_SANDBOX_SAFE_CHAR
 		("fopen64", canonic, mode) {
 		check_dlsym(fopen64);
-		result = true_fopen(pathname, mode);
+		result = true_fopen64(pathname, mode);
 		}
 
 	return result;

fixes this one for me

(btw libsandbox.so code is, well, lets say a bit of a shock when you're used
to looking at kernel code)

Comment 32 Bongani Hlope 2005-08-02 12:54:19 UTC
This is what I have to report back. I run make -j4 for kernel and make -j4 for 
qt on different terminals to reproduce this bug. 
 
1. On a SMP kernel I can still see this bug  
2. Non-SMP kernel I can't reproduce 
3. SMP kernel with maxcpus=1, I can't reproduce 
4. SMP kernel running schedtools -a 0x1 -e make -j4 for the kernel build,  
while running schedtools -a 0x2 -e make -j4 for qt on another terminal I cant 
reproduce the bug. 
 
I added some code to print arg_start and arg_end for the failing applications 
and a failing gcc process (arg_end - arg_start) gave me 33. So its not long 
command line arguments that are causing this. 
Comment 33 Bongani Hlope 2005-08-02 13:32:28 UTC
More report back. 
 
After testing a non-SMP kernel I rebuild a SMP kernel. The strange thing was 
that on this kernel I could not reproduced the bug. Before I went to get a 
rope to shoot myself for sending you guys on a wild goose chase I compared the 
working config file against the previous non working. I rebuild the kernel 
using the breaking config file and I could produce the bug still. 
 
It seems like the bug is dependent on selecting NUMA settings. Here is the 
diff between the working and the breaking kernels. 
 
--- /usr/src/config     2005-08-02 06:42:55.000000000 +0200 
+++ /usr/src/linux-2.6.8/.config        2005-08-02 19:52:29.000000000 +0200 
@@ -1,7 +1,7 @@ 
 # 
 # Automatically generated make config: don't edit 
 # Linux kernel version: 2.6.13-rc4 
-# Sat Jul 30 09:36:44 2005 
+# Tue Aug  2 19:52:29 2005 
 # 
 CONFIG_X86_64=y 
 CONFIG_64BIT=y 
@@ -90,19 +90,16 @@ 
 # CONFIG_PREEMPT_VOLUNTARY is not set 
 CONFIG_PREEMPT=y 
 CONFIG_PREEMPT_BKL=y 
-CONFIG_K8_NUMA=y 
+# CONFIG_K8_NUMA is not set 
 # CONFIG_NUMA_EMU is not set 
-CONFIG_ARCH_DISCONTIGMEM_ENABLE=y 
-CONFIG_NUMA=y 
-CONFIG_ARCH_DISCONTIGMEM_DEFAULT=y 
-CONFIG_ARCH_SPARSEMEM_ENABLE=y 
+# CONFIG_NUMA is not set 
+CONFIG_ARCH_FLATMEM_ENABLE=y 
 CONFIG_SELECT_MEMORY_MODEL=y 
-# CONFIG_FLATMEM_MANUAL is not set 
+CONFIG_FLATMEM_MANUAL=y 
 # CONFIG_DISCONTIGMEM_MANUAL is not set 
-CONFIG_SPARSEMEM_MANUAL=y 
-CONFIG_SPARSEMEM=y 
-CONFIG_NEED_MULTIPLE_NODES=y 
-CONFIG_HAVE_MEMORY_PRESENT=y 
+# CONFIG_SPARSEMEM_MANUAL is not set 
+CONFIG_FLATMEM=y 
+CONFIG_FLAT_NODE_MEM_MAP=y 
 CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID=y 
 CONFIG_HAVE_DEC_LOCK=y 
 CONFIG_NR_CPUS=2 
@@ -147,7 +144,6 @@ 
 CONFIG_ACPI_FAN=m 
 CONFIG_ACPI_PROCESSOR=m 
 CONFIG_ACPI_THERMAL=m 
-CONFIG_ACPI_NUMA=y 
 # CONFIG_ACPI_ASUS is not set 
 # CONFIG_ACPI_IBM is not set 
 # CONFIG_ACPI_TOSHIBA is not set 
 
Comment 34 Chris Caputo 2005-08-02 13:49:20 UTC
Zwane> Has someone managed to reproduce this on an intel system?

Not to my knowledge.  One person on the Gentoo forum reported it on a P4, but it
doesn't seem conclusively to be the same issue.  I inquiried if they had a
/proc/sys/kernel/randomize_va_space and they did not (ie. older kernel).  They
mentioned that their issue seemed to go away when they "turned off support for
PAGEEXEC in the kernel (for PAX protection)".
Comment 35 Chris Caputo 2005-08-02 13:58:13 UTC
> actually libsandbox.so is broken:
[...]
> -		result = true_fopen(pathname, mode);
> +		result = true_fopen64(pathname, mode);

The version I am running, 1.2.11, has the true_fopen64.
Comment 36 Bongani Hlope 2005-08-02 14:13:28 UTC
Hi Chris 
 
The configuration file, without NUMA support seems to work fine for me. I even 
pushed the kernel compile to make -j8, while running make -j4 for qt on 
another terminal 
Comment 37 Chris Caputo 2005-08-02 14:39:30 UTC
My config does _not_ have NUMA enabled and yet the problem happens.

Bongani, does your system still show 2 CPUs when you have NUMA disabled.
Comment 38 Bongani Hlope 2005-08-02 14:59:25 UTC
Yes it still shows 2 CPUs. With NUMA enabled I quickly get: 
 
gcc[30326] general protection rip:404498 rsp:7fffffd157a0 error:0 
gcc[30326] arg_start: 140737485307281 aeg_end: 140737485307314 
 
Modules linked in: isofs zlib_inflate rfcomm l2cap bluetooth snd_seq_dummy 
snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_emu10k1 
snd_rawmidi snd_seq_device snd_ac97_codec snd_pcm snd_timer snd_page_alloc 
snd_util_mem snd_hwdep snd soundcore i2c_isa i2c_viapro usbhid eth1394 tg3 
ide_cd cdrom ohci1394 ieee1394 loop nls_iso8859_1 nls_cp437 vfat fat tuner 
bttv video_buf firmware_class i2c_algo_bit v4l2_common btcx_risc tveeprom 
i2c_core videodev sata_via libata scsi_mod ehci_hcd uhci_hcd usbcore video 
thermal processor hotkey fan button ac 
Pid: 30326, comm: gcc Not tainted 2.6.13-rc4 
RIP: 0033:[<0000000000404498>] [<0000000000404498>] 
RSP: 002b:00007fffffd157a0  EFLAGS: 00010202 
RAX: 0000000000000000 RBX: 000000000051b5b8 RCX: 000000000051b650 
RDX: 000000000051b630 RSI: 00002aaaaadee748 RDI: 0000000000001000 
RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000 
R10: 00002aaaaadee620 R11: 0000000000001010 R12: 000000000051b5e0 
R13: 00000000005171e0 R14: 7478657400746f68 R15: 0000000000000001 
FS:  00002aaaaadf2b00(0000) GS:ffffffff80572880(0000) knlGS:0000000000000000 
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b 
CR2: 00002aaaaac21c90 CR3: 0000000039752000 CR4: 00000000000006e0 
 41 c7 06 00 00 00 00 eb c4 66 66 66 90 66 66 66 90 66 66 66 
 
grep[18028]: segfault at 000000000000000e rip 000000000040da42 rsp 
00007fffffd11a40 error 4 
rm[23322] general protection rip:2aaaaac32260 rsp:7fffffd09768 error:0 
rm[23322] arg_start: 140737485252100 arg_end: 140737485252103 
 
Modules linked in: isofs zlib_inflate rfcomm l2cap bluetooth snd_seq_dummy 
snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_emu10k1 
snd_rawmidi snd_seq_device snd_ac97_codec snd_pcm snd_timer snd_page_alloc 
snd_util_mem snd_hwdep snd soundcore i2c_isa i2c_viapro usbhid eth1394 tg3 
ide_cd cdrom ohci1394 ieee1394 loop nls_iso8859_1 nls_cp437 vfat fat tuner 
bttv video_buf firmware_class i2c_algo_bit v4l2_common btcx_risc tveeprom 
i2c_core videodev sata_via libata scsi_mod ehci_hcd uhci_hcd usbcore video 
thermal processor hotkey fan button ac 
Pid: 23322, comm: rm Not tainted 2.6.13-rc4 
RIP: 0033:[<00002aaaaac32260>] [<00002aaaaac32260>] 
RSP: 002b:00007fffffd09768  EFLAGS: 00010287 
RAX: 65722e006e79642e RBX: 0000000000000000 RCX: 0000000000000001 
RDX: 0000000000000001 RSI: 00007fffffd0a605 RDI: 65722e006e79642e 
RBP: 00007fffffd097c0 R08: 00007fffffd097dc R09: 00007fffffd097d8 
R10: 65722e006e79642e R11: 00002aaaaac32200 R12: 0000000000407280 
R13: 00007fffffd09a00 R14: 0000000000000000 R15: 0000000000000000 
FS:  00002aaaaadf2b00(0000) GS:ffffffff80572800(0000) knlGS:0000000000000000 
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b 
CR2: 00002aaaaac31c10 CR3: 000000005f59b000 CR4: 00000000000006e0 
 f3 a4 4c 89 d0 c3 90 90 90 90 90 90 90 90 90 90 48 89 d0 48 
 
genksyms[4217] general protection rip:400cd3 rsp:7fffffd08c40 error:0 
genksyms[4217] arg_start: 140737485250339 arg_end: 140737485250340 
 
Modules linked in: isofs zlib_inflate rfcomm l2cap bluetooth snd_seq_dummy 
snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_emu10k1 
snd_rawmidi snd_seq_device snd_ac97_codec snd_pcm snd_timer snd_page_alloc 
snd_util_mem snd_hwdep snd soundcore i2c_isa i2c_viapro usbhid eth1394 tg3 
ide_cd cdrom ohci1394 ieee1394 loop nls_iso8859_1 nls_cp437 vfat fat tuner 
bttv video_buf firmware_class i2c_algo_bit v4l2_common btcx_risc tveeprom 
i2c_core videodev sata_via libata scsi_mod ehci_hcd uhci_hcd usbcore video 
thermal processor hotkey fan button ac 
Pid: 4217, comm: genksyms Not tainted 2.6.13-rc4 
RIP: 0033:[<0000000000400cd3>] [<0000000000400cd3>] 
RSP: 002b:00007fffffd08c40  EFLAGS: 00010206 
RAX: 0000000000000000 RBX: 6b6172646e614d28 RCX: 00000000005125e7 
RDX: 0000000000000004 RSI: 0000000000000001 RDI: 00000000005125d9 
RBP: 00000000005125d9 R08: 0000000000000004 R09: 0000000000000003 
R10: 0000000000516fd0 R11: 0000000000000000 R12: 0000000000000001 
R13: 0000000000000024 R14: 00007fffffd0932a R15: 00000000005125d9 
FS:  00002aaaaadf2b00(0000) GS:ffffffff80572800(0000) knlGS:0000000000000000 
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b 
CR2: 000000000050ad58 CR3: 0000000021110000 CR4: 00000000000006e0 
 8b 4b 10 83 f9 01 75 cc 30 c9 41 83 fc 01 44 89 e2 75 d5 30 
 
I added this "debug" line: 
gcc[30326] arg_start: 140737485307281 aeg_end: 140737485307314 
rm[23322] arg_start: 140737485252100 arg_end: 140737485252103 
 
to see the size of the command line arguments 
Comment 39 Chris Caputo 2005-08-02 16:40:42 UTC
Another repro method, from a person on the Gentoo forum:

  tar -xjvf gimp-2.3.2.tar.bz2
  cd gimp-2.3.2/
  ./configure
  make -j4

  "It usually compiles fine for a while (2-5 minutes) before breaking. It seems
random though because I can restart the compilation (no "make clean") and it
will work for a little while and then crash in a different place. Sometimes a
segfault or general prot won't cause make to error out. Here are the errors I
got this time. I have another window open with root running

  sed[29234] general protection rip:40870a rsp:7fffffd1a020 error:0
  sed[31892] general protection rip:40870a rsp:7fffffd189b0 error:0

I had firefox, top, folding@home and xmms running also" [...]

NOTE: The RIP on these is different than the ones where the problem happens in
_dl_start().
Comment 40 Andrew Morton 2005-08-02 17:27:56 UTC
> Another repro method, from a person on the Gentoo forum:

Did that person also find that

	echo 0 > /proc/sys/kernel/randomize_va_space

fixed it up?

Comment 41 Chris Caputo 2005-08-02 18:14:54 UTC
Yes, the gimp compile repro was fixed with:

  echo 0 > /proc/sys/kernel/randomize_va_space

I think this means we can say the Gentoo sandbox LD_PRELOAD aggravates the
problem but that the problem happens regardless.
Comment 42 Arjan van de Ven 2005-08-03 02:03:23 UTC
On Tue, Aug 02, 2005 at 06:14:59PM -0700, bugme-daemon@kernel-bugs.osdl.org wrote:
> I think this means we can say the Gentoo sandbox LD_PRELOAD aggravates the
> problem but that the problem happens regardless.

the sandbox code is REALLY bad and depends on a series of undefined glibc
behaviors and orderings of shared libraries that get loaded, and as such
it's hard to be sure it's the same bug we're chasing in this bug. In
addition, are all these gentoo reports with a kernel.org kernel or with a
kernel patches with PaX and whatnot? If it's the later then we need to
discard those for now (simply because 4 level pagetables went in about the
same time and PaX is more likely to interact with that)


Can someone point to the latest libsafe code just in case? 

Comment 43 Martin Schlemmer 2005-08-03 02:52:12 UTC
> the sandbox code is REALLY bad and depends on a series of undefined glibc
> behaviors and orderings of shared libraries that get loaded, and as such
> it's hard to be sure it's the same bug we're chasing in this bug.

You should have seen it some time ago :/  I am slowly trying to clean it up, but
not that high on my priority list.  If you can however point out something
definate, I will be more than happy to look at it.

> Can someone point to the latest libsafe code just in case? 

libsafe or libsandbox ?
Comment 44 Arjan van de Ven 2005-08-03 03:00:16 UTC
On Wed, Aug 03, 2005 at 02:52:36AM -0700, bugme-daemon@kernel-bugs.osdl.org wrote:
> You should have seen it some time ago :/  I am slowly trying to clean it up, but
> not that high on my priority list.  If you can however point out something
> definate, I will be more than happy to look at it.

apparently I looked at a slightly older one so I'd like to look at the
latest one first before continueing. (Well not "like to" per se, I just had
lunch :)
> 
> > Can someone point to the latest libsafe code just in case? 
> 
> libsafe or libsandbox ?

eh libsandbox

Comment 45 Martin Schlemmer 2005-08-03 08:52:40 UTC
Latest svn sources is here:

  ftp://ftp.nosferatu.za.org/pub/sandbox-1.2.11.tar.bz2

As I said, I know there is still a bit of issues, and implementation details,
with needed cleanups ... the original writer departed us with it in a bad shape.
 Among the main things that I will need to address in the near future is to
actually have versioned symbols, and both old/new versions when building on
glibc.  The last few months however was more just spent trying to get all known
bugs (corruption, etc) and and some cleanups done.
Comment 46 Chris Caputo 2005-08-03 09:48:09 UTC
On Wed, 3 Aug 2005, arjanv@redhat.com wrote:
> In addition, are all these gentoo reports with a kernel.org kernel or 
> with a kernel patches with PaX and whatnot? If it's the later then we 
> need to discard those for now (simply because 4 level pagetables went in 
> about the same time and PaX is more likely to interact with that)

The Intel person I think we can rule out was a PaX user.

The AMD users have reported the problem with kernel.org's 2.6.12.3 and 
2.6.13-rc4.

Comment 47 Bongani Hlope 2005-08-03 11:06:59 UTC
>   tar -xjvf gimp-2.3.2.tar.bz2
>   cd gimp-2.3.2/
>   ./configure
>   make -j4
> 
>   "It usually compiles fine for a while (2-5 minutes) before breaking. It seems
> random though because I can restart the compilation (no "make clean") and it
> will work for a little while and then crash in a different place. Sometimes a
> segfault or general prot won't cause make to error out. Here are the errors I
> got this time. I have another window open with root running
> 

Yep, this is pretty much how I get this error

svn update
make -f Makefile.cvs
cd /home/bongani/development/cpp/kde/build/kdelibs
/home/bongani/development/cpp/kde/src/kdelibs/configure
make -j4;
 make install;

They happen randomly, which is why I still don't have the output of /proc/#/maps


>   sed[29234] general protection rip:40870a rsp:7fffffd1a020 error:0
>   sed[31892] general protection rip:40870a rsp:7fffffd189b0 error:0
> 

The RIP have changed from some kerne l(I'll check when did they change), the new ones are around those values as well.
genksyms[4217] general protection rip:400cd3 rsp:7fffffd08c40 error:0
rm[23322] general protection rip:2aaaaac32260 rsp:7fffffd09768 error:0
grep[18028]: segfault at 000000000000000e rip 000000000040da42 rsp 00007fffffd11a40 error 4
gcc[30326] general protection rip:404498 rsp:7fffffd157a0 error:0

Comment 48 Chris Caputo 2005-08-03 12:14:49 UTC
I confirm that this bug applies to 2.6.13-rc5 too.  Bongani, you may want to
update the "kernel version" for this bug, if you confirm the same.
Comment 49 Bongani Hlope 2005-08-04 12:47:43 UTC
Ok I tested 2.6.13-rc5 with i) NUMA-discontinues memory, NUMA-sparse memory 
and a non-NUMA kernel. They all hit the bug, but the NUMA-discontinues kernel 
silently fails with this error: 
 
i) NUMA-discontinues 
 
make -f scripts/Makefile.build obj=arch/x86_64/kernel 
  gcc -Wp,-MD,arch/x86_64/kernel/.process.o.d  -nostdinc 
-isystem /usr/lib/gcc/x86_64-mandrake-linux-gnu/3.4.3/include -D__KERNEL__ 
-Iinclude  -Wall -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing 
-fno-common -ffreestanding -O2     -fomit-frame-pointer -g -march=k8 
-mno-red-zone -mcmodel=kernel -pipe -fno-reorder-blocks  -Wno-sign-compare 
-funit-at-a-time -mno-sse -mno-mmx -mno-sse2 -mno-3dnow 
-Wdeclaration-after-statement     -DKBUILD_BASENAME=process 
-DKBUILD_MODNAME=process -c -o arch/x86_64/kernel/.tmp_process.o 
arch/x86_64/kernel/process.c 
:includes nested too deeply 
make[1]: *** [arch/x86_64/kernel/process.o] Error 1 
make: *** [arch/x86_64/kernel] Error 2 
 
The error goes away when I did echo 0 > /proc/sys/kernel/randomize_va_space 
 
 
ii) NUMA-sparsemem 
rm[23068] general protection rip:2aaaaac32260 rsp:7fffff908578 error:0 
rm[13440] general protection rip:2aaaaac32260 rsp:7fffffd09818 error:0 
gcc[10127] general protection rip:404498 rsp:7ffffff164e0 error:0 
 
checking for 
inttypes.h... /home/bongani/development/cpp/kde/src/kdelibs/configure: line 
7779: 13440 Segmentation fault      rm -f conftest.er1 
 
/bin/sh: line 1: 10127 Segmentation fault      (core dumped) gcc 
-Wp,-MD,drivers/acpi/namespace/.nsxfeval.o.d -nostdinc 
-isystem /usr/lib/gcc/x86_64-mandrake-linux-gnu/3.4.3/include -D__KERNEL__ 
-Iinclude -Wall -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing 
-fno-common -ffreestanding -O2 -fomit-frame-pointer -g -march=k8 -mno-red-zone 
-mcmodel=kernel -pipe -fno-reorder-blocks -Wno-sign-compare -funit-at-a-time 
-mno-sse -mno-mmx -mno-sse2 -mno-3dnow -Wdeclaration-after-statement -Os 
-DKBUILD_BASENAME=nsxfeval -DKBUILD_MODNAME=nsxfeval -c -o 
drivers/acpi/namespace/.tmp_nsxfeval.o drivers/acpi/namespace/nsxfeval.c 
 
Same errors as always 
 
non-NUMA 
 
gcc[25983] general protection rip:404498 rsp:7fffffb16900 error:0 
 
/bin/sh: line 1: 25983 Segmentation fault      (core dumped) gcc 
-Wp,-MD,drivers/ide/.ide-taskfile.o.d -nostdinc 
-isystem /usr/lib/gcc/x86_64-mandrake-linux-gnu/3.4.3/include -D__KERNEL__ 
-Iinclude -Wall -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing 
-fno-common -ffreestanding -O2 -fomit-frame-pointer -g -march=k8 -mno-red-zone 
-mcmodel=kernel -pipe -fno-reorder-blocks -Wno-sign-compare -funit-at-a-time 
-mno-sse -mno-mmx -mno-sse2 -mno-3dnow -Wdeclaration-after-statement 
-Idrivers/ide -DKBUILD_BASENAME=ide_taskfile -DKBUILD_MODNAME=ide_core -c -o 
drivers/ide/.tmp_ide-taskfile.o drivers/ide/ide-taskfile.c 
make[2]: *** [drivers/ide/ide-taskfile.o] Error 139 
 
So the non-NUMA kernel also fails, the previous success for this seems like a 
fluke. 
Comment 50 Brandon Stewart 2005-08-09 09:25:50 UTC
I cannot reproduce this with my dual opteron 246 system. Compiling kernel, gimp
source, running prime95 while compiling gimp, it all works fine on 2.6.13-rc6.
I've also been using 13-rc2, rc3, rc4, and rc5 and have not noticed the problem
on any of them (though I didn't really stress previous rc's).

opteron:/home/brandon/tmp/crap # cat /proc/sys/kernel/randomize_va_space
1

I have 4 gigs memory and am not using any swap. Must swap be enabled for this
problem to occur?

<a href="http://d3.vunct.net/yx619.txt">config</a>
<a href="http://d3.vunct.net/oy620.txt">dmesg</a>
<a href="http://d3.vunct.net/zy621.php">lsmod</a>
<a href="http://d3.vunct.net/jz622.php">lspci</a>
Comment 51 Jules Colding 2005-08-10 02:31:44 UTC
I seem to experience this problem too. I have not seen any segfaults since I did
the "echo 0 > /proc/sys/kernel/randomize_va_space" trick. Full report with lots
of info here:

<http://kerneltrap.org/mailarchive/1/message/101754/thread>
Comment 52 Arjan van de Ven 2005-08-10 02:41:25 UTC
Can the people who use the patched gentoo kernel sources (the one with PaX)
please not report things into this bug (or if you already have, please say so)
because PaX adds another layer of randomisation and that may well conflict, and
is only confusing this bugreport with effectively useless information as a result.
Comment 53 Jules Colding 2005-08-10 02:52:57 UTC
I am not using the PaX enabled kernel by the "Hardened Gentoo" subproject. The
kernel is the normal gentoo-sources. 
Comment 54 Andi Kleen 2005-08-10 03:03:29 UTC
I guess need to somehow capture /proc/pid/maps of a segfaulting application. 
Unfortunately it's difficult because it has already segfaulted. 
What would work is to run it in gdb and when you get the segfault Ctrl-z 
and save /proc/pidofapplication/maps 
 
Then attach that to the bug report. 
 
Also include a known good maps of the same app.  
 
(this bugzilla really needs NEEDINFO) 
 
Comment 55 PaX Team 2005-08-10 06:01:38 UTC
re: comment #52: Arjan, FYI, PaX wasn't released for .12 and .13 yet, therefore 
all reports here have nothing to do with it (not to mention that i won't 
add 'another layer', i'll simply replace it with a better one). If there's an 
adventurous soul out there who can reproduce the problem and wants to try it 
under PaX then feel free to email me in private.

re: comment #54: why don't you log /proc/pid/maps on the GPF as well?
Comment 56 Chris Caputo 2005-08-10 12:34:25 UTC
Created attachment 5589 [details]
Patch to x86-64 traps.c in 2.6.13-rc6 to show maps, regs and RIP bytes.

I didn't have luck with gdb so I patched traps.c (patch now with this bug) to
show the maps at the time of the fault.  Please review my patch for any errors
- in particular whether the for loop in show_maps() catches all of the memory
areas properly.  I ask because I am surprised to not see stack areas in the
below results - then again maybe that is an indicator of the problem.  Here are
some results with 2.6.13-rc6:

grep[10384] general protection rip:2aaaaaaac274 rsp:7fffff9bcda0 error:0

Modules linked in:
Pid: 10384, comm: grep Not tainted 2.6.13-rc6
RIP: 0033:[<00002aaaaaaac274>] [<00002aaaaaaac274>]
RSP: 002b:00007fffff9bcda0  EFLAGS: 00010286
RAX: 8c8afb8a4852a409 RBX: 0000000000000000 RCX: 00002aaaaaaabab0
RDX: 0000004c845a2eac RSI: 8c8afb8a4852a409 RDI: 8c8afb8a484155f1
RBP: 0000000000000000 R08: 00002aaaaabc03a8 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000200 R12: 0000000000000000
R13: 0000000000000000 R14: 00007fffff9bce10 R15: 0000000000000000
FS:  00002aaaaaff36d0(0000) GS:ffffffff804d2800(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002aaaaabbfd90 CR3: 0000000230eb7000 CR4: 00000000000006e0
 48 8b 00 48 85 c0 74 74 48 89 c2 41 b9 ff ff ff 6f 41 ba 21
/proc/$$/maps:
00400000-00414000 r-xp 00000000 08:02 522299 /bin/grep
00514000-00515000 rw-p 00014000 08:02 522299 /bin/grep
00515000-00516000 rw-p 00515000 08:02 522299 [heap]
2aaaaaaab000-2aaaaaac0000 r-xp 00000000 08:02 537108 /lib64/ld-2.3.5.so
2aaaaabbf000-2aaaaabc1000 rw-p 00014000 08:02 537108 /lib64/ld-2.3.5.so


rm[10836] general protection rip:2aaaaaaac274 rsp:7ffffffbd8b0 error:0

Modules linked in:
Pid: 10836, comm: rm Not tainted 2.6.13-rc6
RIP: 0033:[<00002aaaaaaac274>] [<00002aaaaaaac274>]
RSP: 002b:00007ffffffbd8b0  EFLAGS: 00010292
RAX: 9d96be7b808c9bc7 RBX: 0000000000000000 RCX: 00002aaaaaaabab0
RDX: 0000004d854ff8e1 RSI: 9d96be7b808c9bc7 RDI: 9d96be7b807b4daf
RBP: 0000000000000000 R08: 00002aaaaabc03a8 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000200 R12: 0000000000000000
R13: 0000000000000000 R14: 00007ffffffbd920 R15: 0000000000000000
FS:  00002aaaaaff36d0(0000) GS:ffffffff804d2880(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002aaaaabbfd90 CR3: 000000023184e000 CR4: 00000000000006e0
 48 8b 00 48 85 c0 74 74 48 89 c2 41 b9 ff ff ff 6f 41 ba 21
/proc/$$/maps:
00400000-00409000 r-xp 00000000 08:02 554409 /bin/rm
00508000-00509000 rw-p 00008000 08:02 554409 /bin/rm
2aaaaaaab000-2aaaaaac0000 r-xp 00000000 08:02 537108 /lib64/ld-2.3.5.so
2aaaaabbf000-2aaaaabc1000 rw-p 00014000 08:02 537108 /lib64/ld-2.3.5.so


sed[12011] general protection rip:2aaaaaaac274 rsp:7ffffffbe060 error:0

Modules linked in:
Pid: 12011, comm: sed Not tainted 2.6.13-rc6
RIP: 0033:[<00002aaaaaaac274>] [<00002aaaaaaac274>]
RSP: 002b:00007ffffffbe060  EFLAGS: 00010286
RAX: d0dfc8413e8ce609 RBX: 0000000000000000 RCX: 00002aaaaaaabab0
RDX: 0000004ef8c9161d RSI: d0dfc8413e8ce609 RDI: d0dfc8413e7b97f1
RBP: 0000000000000000 R08: 00002aaaaabc03a8 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000200 R12: 0000000000000000
R13: 0000000000000000 R14: 00007ffffffbe0d0 R15: 0000000000000000
FS:  00002aaaaaff36d0(0000) GS:ffffffff804d2880(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002aaaaabbfd90 CR3: 0000000234ec3000 CR4: 00000000000006e0
 48 8b 00 48 85 c0 74 74 48 89 c2 41 b9 ff ff ff 6f 41 ba 21
/proc/$$/maps:
00400000-0041b000 r-xp 00000000 08:02 507682 /bin/sed
0051a000-0051b000 rw-p 0001a000 08:02 507682 /bin/sed
0051b000-00523000 rw-p 0051b000 08:02 507682 [heap]
2aaaaaaab000-2aaaaaac0000 r-xp 00000000 08:02 537108 /lib64/ld-2.3.5.so
2aaaaabbf000-2aaaaabc1000 rw-p 00014000 08:02 537108 /lib64/ld-2.3.5.so


chmod[12223] general protection rip:2aaaaaaac274 rsp:7fffff9bdfb0 error:0

Modules linked in:
Pid: 12223, comm: chmod Not tainted 2.6.13-rc6
RIP: 0033:[<00002aaaaaaac274>] [<00002aaaaaaac274>]
RSP: 002b:00007fffff9bdfb0  EFLAGS: 00010286
RAX: d08db7357b9badcf RBX: 0000000000000000 RCX: 00002aaaaaaabab0
RDX: 0000004fb06107a4 RSI: d08db7357b9badcf RDI: d08db7357b8a5fb7
RBP: 0000000000000000 R08: 00002aaaaabc03a8 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000200 R12: 0000000000000000
R13: 0000000000000000 R14: 00007fffff9be020 R15: 0000000000000000
FS:  00002aaaaaff36d0(0000) GS:ffffffff804d2800(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002aaaaabbfd90 CR3: 0000000230ff0000 CR4: 00000000000006e0
 48 8b 00 48 85 c0 74 74 48 89 c2 41 b9 ff ff ff 6f 41 ba 21
/proc/$$/maps:
00400000-00409000 r-xp 00000000 08:02 554392 /bin/chmod
00508000-00509000 rw-p 00008000 08:02 554392 /bin/chmod
2aaaaaaab000-2aaaaaac0000 r-xp 00000000 08:02 537108 /lib64/ld-2.3.5.so
2aaaaabbf000-2aaaaabc1000 rw-p 00014000 08:02 537108 /lib64/ld-2.3.5.so
Comment 57 Arjan van de Ven 2005-08-10 13:05:47 UTC
On Wed, Aug 10, 2005 at 12:34:34PM -0700, bugme-daemon@kernel-bugs.osdl.org wrote:
> grep[10384] general protection rip:2aaaaaaac274 rsp:7fffff9bcda0 error:0
> /proc/$$/maps:
> 00400000-00414000 r-xp 00000000 08:02 522299 /bin/grep
> 00514000-00515000 rw-p 00014000 08:02 522299 /bin/grep
> 00515000-00516000 rw-p 00515000 08:02 522299 [heap]
> 2aaaaaaab000-2aaaaaac0000 r-xp 00000000 08:02 537108 /lib64/ld-2.3.5.so
> 2aaaaabbf000-2aaaaabc1000 rw-p 00014000 08:02 537108 /lib64/ld-2.3.5.so

ok now it gets interesting. For some reason, rip has run "off" the mapped
area of ld.so it seems.....

can you get the following info?

eu-readelf -l /lib64/ld-2.3.5.so

that is supposed to give an idea of where the .so file says it wants to be
mapped..

Comment 58 Bongani Hlope 2005-08-10 13:08:00 UTC
Created attachment 5590 [details]
Dump maps of failing program this is againt 2.6.13-rc5

;) I've just started testing a patch I've been working on as well. My patch is
based on fs/seq_file.c and fs/proc/task_mmu.c. It also needs reviewing.

Chris, I'll take a look at your patch, maybe we have a same idea. These are the
dumps I got using my patch. Oh, I also applied it on traps.c (it seems like the
only logical place to put it). Now back to the dumps:

gcc[29251] general protection rip:404498 rsp:7fffff917250 error:0
00400000-00417000 r-xp 00000000 03:06 7375679				
/usr/bin/gcc-3.4.3
00516000-00518000 rw-p 00016000 03:06 7375679				
/usr/bin/gcc-3.4.3
00518000-00539000 rw-p 00518000 00:00 0 				 [heap]

2aaaaaaab000-2aaaaaabf000 r-xp 00000000 03:06 11911179			
/lib64/ld-2.3.4.so
2aaaaaabf000-2aaaaaac0000 rw-p 2aaaaaabf000 00:00 0
2aaaaabbf000-2aaaaabc0000 r--p 00014000 03:06 11911179			
/lib64/ld-2.3.4.so
2aaaaabc0000-2aaaaabc1000 rw-p 00015000 03:06 11911179			
/lib64/ld-2.3.4.so
2aaaaabc1000-2aaaaace9000 r-xp 00000000 03:06 11911311			
/lib64/tls/libc-2.3.4.so
2aaaaace9000-2aaaaade8000 ---p 00128000 03:06 11911311			
/lib64/tls/libc-2.3.4.so
2aaaaade8000-2aaaaadeb000 r--p 00127000 03:06 11911311			
/lib64/tls/libc-2.3.4.so
2aaaaadeb000-2aaaaadee000 rw-p 0012a000 03:06 11911311			
/lib64/tls/libc-2.3.4.so
2aaaaadee000-2aaaaadf4000 rw-p 2aaa

gcc[26423] general protection rip:404498 rsp:7ffffff15d70 error:0
00400000-00417000 r-xp 00000000 03:06 7375679				
/usr/bin/gcc-3.4.3
00516000-00518000 rw-p 00016000 03:06 7375679				
/usr/bin/gcc-3.4.3
00518000-00539000 rw-p 00518000 00:00 0 				 [heap]

2aaaaaaab000-2aaaaaabf000 r-xp 00000000 03:06 11911179			
/lib64/ld-2.3.4.so
2aaaaaabf000-2aaaaaac0000 rw-p 2aaaaaabf000 00:00 0
2aaaaabbf000-2aaaaabc0000 r--p 00014000 03:06 11911179			
/lib64/ld-2.3.4.so
2aaaaabc0000-2aaaaabc1000 rw-p 00015000 03:06 11911179			
/lib64/ld-2.3.4.so
2aaaaabc1000-2aaaaace9000 r-xp 00000000 03:06 11911311			
/lib64/tls/libc-2.3.4.so
2aaaaace9000-2aaaaade8000 ---p 00128000 03:06 11911311			
/lib64/tls/libc-2.3.4.so
2aaaaade8000-2aaaaadeb000 r--p 00127000 03:06 11911311			
/lib64/tls/libc-2.3.4.so
2aaaaadeb000-2aaaaadee000 rw-p 0012a000 03:06 11911311			
/lib64/tls/libc-2.3.4.so
2aaaaadee000-2aaaaadf4000 rw-p 2aaa
Comment 59 Chris Caputo 2005-08-10 13:14:50 UTC
On Wed, 10 Aug 2005, arjanv@redhat.com wrote:
>> grep[10384] general protection rip:2aaaaaaac274 rsp:7fffff9bcda0 error:0
>> /proc/$$/maps:
>> 00400000-00414000 r-xp 00000000 08:02 522299 /bin/grep
>> 00514000-00515000 rw-p 00014000 08:02 522299 /bin/grep
>> 00515000-00516000 rw-p 00515000 08:02 522299 [heap]
>> 2aaaaaaab000-2aaaaaac0000 r-xp 00000000 08:02 537108 /lib64/ld-2.3.5.so
>> 2aaaaabbf000-2aaaaabc1000 rw-p 00014000 08:02 537108 /lib64/ld-2.3.5.so
>
> ok now it gets interesting. For some reason, rip has run "off" the mapped
> area of ld.so it seems.....

But 2aaaaaaac274 is between 2aaaaaaab000 and 2aaaaaac0000, no?

> can you get the following info?
>
> eu-readelf -l /lib64/ld-2.3.5.so
>
> that is supposed to give an idea of where the .so file says it wants to be
> mapped..

# eu-readelf -l /lib64/ld-2.3.5.so
Program Headers:
   Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
   LOAD           0x000000 0x0000000000000000 0x0000000000000000 0x014b58 0x014b58 R E 0x100000
   LOAD           0x014c40 0x0000000000114c40 0x0000000000114c40 0x000be8 0x000d68 RW  0x100000
   DYNAMIC        0x014e18 0x0000000000114e18 0x0000000000114e18 0x000180 0x000180 RW  0x8
   GNU_EH_FRAME   0x01341c 0x000000000001341c 0x000000000001341c 0x0004b4 0x0004b4 R   0x4
   GNU_STACK      0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW  0x8
   GNU_RELRO      0x014c40 0x0000000000114c40 0x0000000000114c40 0x0003c0 0x0003c0 R   0x1
   LOOS+84153728  0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000     0x8

  Section to Segment mapping:
   Segment Sections...
    00      [RO: .hash .dynsym .dynstr .gnu.version .gnu.version_d .rela.dyn .rela.plt .plt .text .rodata .eh_frame_hdr .eh_frame]
    01      [RELRO: .data.rel.ro .dynamic .got] .data .bss
    02      [RELRO: .dynamic]
    03      [RO: .eh_frame_hdr]
    04
    05      [RELRO: .data.rel.ro .dynamic .got]
    06

Comment 60 PaX Team 2005-08-10 16:19:57 UTC
re: comment #56: your for loop checks for map->vm_next, it's not needed (this is
why you don't see the stack vma which is normally the last one in the vma list
and has a NULL vm_next field). also, can you add a do_coredump() call as well
then make the core available or analyze it yourself, in particular you should
follow Ulrich Drepper's instructions to verify that the initial memory accesses
in ld.so read proper values, it seems that memory is trashed somehow, you should
see that in the coredump (don't forget to comment out the !vma->anon_vma check
in maydump() as Linus pointed it already out above).
Comment 61 Chris Caputo 2005-08-10 21:56:13 UTC
Created attachment 5597 [details]
Patch to x86-64 traps.c in 2.6.13-rc6 to show maps, regs and RIP bytes.

Fix bug in patch in which final memory area wasn't being shown.
Comment 62 Chris Caputo 2005-08-10 23:15:45 UTC
On Wed, 10 Aug 2005, pageexec@freemail.hu wrote:
> re: comment #56: your for loop checks for map->vm_next, it's not needed 
> (this is why you don't see the stack vma which is normally the last one 
> in the vma list and has a NULL vm_next field).

Thanks for the fix.

> also, can you add a do_coredump() call as well then make the core 
> available or analyze it yourself, in particular you should follow Ulrich 
> Drepper's instructions to verify that the initial memory accesses in 
> ld.so read proper values, it seems that memory is trashed somehow, you 
> should see that in the coredump (don't forget to comment out the 
> !vma->anon_vma check in maydump() as Linus pointed it already out 
> above).

I created a core dump with the !vma->anon_vma check in maydump() removed.

Accessing the data in the core dump and doing the math to match the 
disassembly produces valid data, whereas the register dump shows invalid 
data.  This returns me to my speculation in comment #18 of "Is it possible 
the page with this data wasn't fully instantiated when then code ran, but 
was fully instantiated by the time the core dump happened?"

By the way, here's a corrected map dump with the stack area included:

rm[3706] general protection rip:2aaaaaaac274 rsp:7fffffbbdeb0 error:0

Modules linked in:
Pid: 3706, comm: rm Not tainted 2.6.13-rc6
RIP: 0033:[<00002aaaaaaac274>] [<00002aaaaaaac274>]
RSP: 002b:00007fffffbbdeb0  EFLAGS: 00010296
RAX: d5d0b6397b499cc3 RBX: 0000000000000000 RCX: 00002aaaaaaabab0
RDX: 00000051350aa74c RSI: d5d0b6397b499cc3 RDI: d5d0b6397b384eab
RBP: 0000000000000000 R08: 00002aaaaabc03a8 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000200 R12: 0000000000000000
R13: 0000000000000000 R14: 00007fffffbbdf20 R15: 0000000000000000
FS:  00002aaaaaff36d0(0000) GS:ffffffff804d2880(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002aaaaabbfd90 CR3: 0000000231880000 CR4: 00000000000006e0
  48 8b 00 48 85 c0 74 74 48 89 c2 41 b9 ff ff ff 6f 41 ba 21
/proc/$$/maps:
00400000-00409000 r-xp 00000000 08:02 554409 /bin/rm
00508000-00509000 rw-p 00008000 08:02 554409 /bin/rm
2aaaaaaab000-2aaaaaac0000 r-xp 00000000 08:02 537108 /lib64/ld-2.3.5.so
2aaaaabbf000-2aaaaabc1000 rw-p 00014000 08:02 537108 /lib64/ld-2.3.5.so
7fffffbab000-7fffffbc1000 rw-p 7fffffbab000 08:02 537108 [stack]

For the faults I see, in the disassembly the relevant lines are:

mov    1131992(%rip),%rax  # either 1131992(%rip) is garbarge (likely)
lea    -47(%rip),%rdi      # or -47(%rip) is garbage (unlikely, since an lea)
sub    %rax,%rdi
mov    %rdi,%rax
add    1129852(%rip),%rax  # RAX is consistenly 0x114e18 more than RDI as
                            # expected, so problem starts prior to this add.
mov    (%rax),%rax         # BAM!

My perception is that the "mov 1131992(%rip),%rax" is sometimes resulting 
in junk data landing in RAX as a result of the data section of ld-2.3.5.so 
not being fully mapped.  Or something.

Comment 63 PaX Team 2005-08-11 04:12:07 UTC
re: comment #62: thanks for the data. my thoughts: since the data in the
coredump is apparently correct, we have a temporary glitch only (like we didn't
know it already :-). where can it come from?

1. PC-relative addressing is wrong 'sometimes'
2. the CPU D$ is somehow out of sync
3. the (physical) page doesn't yet contain all data from DMA
4. the TLB entries point to the wrong page
5. the page table entries point to the wrong page

you can verify 5 by adding a printk into mm/memory.c:do_no_page() to show the
arguments of set_pte_at() (probably you should add a strcmp(current->comm) check
so that you don't flood your logs), do the same in the GPF handler (needs a page
table walk) so we can compare. it'll probably not be the case since by the time
of the coredump the PTEs are apparently fine as they point to expected data and
the PTEs in question are not written in-between (in theory).

once you know that 5 is not the case, you can test case 4 by adding an explicit
local TLB flush into do_no_page() where it already has a comment to the contrary
(which is fine itself, this extra TLB flush would just make sure that the CPU
doesn't actually have any entry, if it turns out that it does, there's a problem
elsewhere in page table freeing/TLB flushing).

i don't know how to verify case 3, it 'should not happen' of course, but adding
some logging somewhere in the block or whatever layer could prove it.

for case 2, you could add an explicit wbinvd() call into do_no_page(), it should
ensure that the caches are in sync with main memory.

i have no idea how to test for 1, except for maybe writing a small program in
assembly that would have some data in .data and check it through PC-relative
addressing, then you'd run this program until it fails (that is, this should be
a standalone/statically linked app, without any crt*/etc linked in, quickest
hack is to just hexedit a copy of ld.so and have some app use the copy :-).

on a sidenote, everyone seemed to have solved/mitigated the issue by turning off
randomization, however in Chris Caputo's logs i don't see any sign of mmap
randomization yet the problem still manifested, how's that possible? or is it
somehow the stack randomization that interferes with file mappings?
Comment 64 Arjan van de Ven 2005-08-11 04:26:12 UTC
On Thu, Aug 11, 2005 at 04:12:16AM -0700, bugme-daemon@kernel-bugs.osdl.org wrote:
> on a sidenote, everyone seemed to have solved/mitigated the issue by turning off
> randomization, however in Chris Caputo's logs i don't see any sign of mmap
> randomization yet the problem still manifested, how's that possible? or is it
> somehow the stack randomization that interferes with file mappings?

the assumed culprit has been narrowed down to the part where the stack vma
is randomized by one of the first reporters; just that disabled but with the
other randomisations (including stack pointer) on didn't cause the crashes
for him. If we assume that that holds true for the other reporters as well,
then, well, I'm quite surprised how that happens. I agree that if library
randomisation was involved that would be a prime suspect, but on first sight
it is not :-(

Also in the dumps it seems the stack is quite a large part away from any
other mapping, so it doesn't look like accidental overlap either.

Comment 65 Bongani Hlope 2005-08-11 12:10:11 UTC
Yes, the maps show that the stack if far away from the crashes. Library
randomisation doesn't seem to be the problem.  The following crashes happen in
different places. The first one is within ld.so and the rest are inside the
actual failling application.

ar[23024] general protection rip:2aaaaabe31d7 rsp:7fffff909da8 error:0

Modules linked in: isofs zlib_inflate rfcomm l2cap bluetooth snd_seq_dummy
snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_emu10k1
snd_rawmidi snd_seq_device snd_ac97_codec snd_pcm snd_timer snd_page_alloc
snd_util_mem snd_hwdep snd soundcore i2c_isa i2c_viapro usbhid eth1394 tg3
ide_cd cdrom ohci1394 ieee1394 loop nls_iso8859_1 nls_cp437 vfat fat tuner bttv
video_buf firmware_class i2c_algo_bit v4l2_common btcx_risc tveeprom i2c_core
videodev sata_via libata scsi_mod ehci_hcd uhci_hcd usbcore video thermal
processor hotkey fan button ac
Pid: 23024, comm: ar Tainted: G   M  2.6.13-rc5
RIP: 0033:[<00002aaaaabe31d7>] [<00002aaaaabe31d7>]
RSP: 002b:00007fffff909da8  EFLAGS: 00010202
RAX: 00002aaaaad49820 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000034 RSI: 00007fffff907640 RDI: 702e746f672e0074
RBP: 000000000050aa40 R08: 00002aaaab0a9b00 R09: ff4b4b405e424aff
R10: 00002aaaaabc09e8 R11: 00002aaaaabe31d0 R12: 00007fffff90a505
R13: 0000000000000000 R14: 0000000000000007 R15: 0000000000000000
FS:  00002aaaab0a9b00(0000) GS:ffffffff80574800(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002aaaaac3501d CR3: 00000000665c6000 CR4: 00000000000006e0
 48 39 47 20 74 06 b8 01 00 00 00 c3 48 83 7f 18 00 74 f3 66
/proc/$$/maps:
00400000-0040b000 r-xp 00000000 03:06 7375767 /usr/bin/ar
0050a000-0050b000 rw-p 0000a000 03:06 7375767 /usr/bin/ar
0050b000-0052c000 rw-p 0050b000 03:06 7375767 [heap]
2aaaaaaab000-2aaaaaabf000 r-xp 00000000 03:06 11911179 /lib64/ld-2.3.4.so
2aaaaaabf000-2aaaaaac0000 rw-p 2aaaaaabf000 03:06 11911179
2aaaaabbf000-2aaaaabc0000 r--p 00014000 03:06 11911179 /lib64/ld-2.3.4.so
2aaaaabc0000-2aaaaabc1000 rw-p 00015000 03:06 11911179 /lib64/ld-2.3.4.so
2aaaaabc1000-2aaaaac4a000 r-xp 00000000 03:06 14696454
/usr/lib64/libbfd-2.15.92.0.2.so
2aaaaac4a000-2aaaaad49000 ---p 00089000 03:06 14696454
/usr/lib64/libbfd-2.15.92.0.2.so
2aaaaad49000-2aaaaad54000 rw-p 00088000 03:06 14696454
/usr/lib64/libbfd-2.15.92.0.2.so
2aaaaad54000-2aaaaad59000 rw-p 2aaaaad54000 03:06 14696454
2aaaaad74000-2aaaaad75000 rw-p 2aaaaad74000 03:06 14696454
2aaaaad75000-2aaaaad77000 r-xp 00000000 03:06 11911201 /lib64/libdl-2.3.4.so
2aaaaad77000-2aaaaae76000 ---p 00002000 03:06 11911201 /lib64/libdl-2.3.4.so
2aaaaae76000-2aaaaae78000 rw-p 00001000 03:06 11911201 /lib64/libdl-2.3.4.so
2aaaaae78000-2aaaaafa0000 r-xp 00000000 03:06 11911311 /lib64/tls/libc-2.3.4.so
2aaaaafa0000-2aaaab09f000 ---p 00128000 03:06 11911311 /lib64/tls/libc-2.3.4.so
2aaaab09f000-2aaaab0a2000 r--p 00127000 03:06 11911311 /lib64/tls/libc-2.3.4.so
2aaaab0a2000-2aaaab0a5000 rw-p 0012a000 03:06 11911311 /lib64/tls/libc-2.3.4.so
2aaaab0a5000-2aaaab0ab000 rw-p 2aaaab0a5000 03:06 11911311
7fffff8f6000-7fffff90b000 rw-p 7fffff8f6000 03:06 11911311 [stack]

sed[10100] general protection rip:40870a rsp:7fffffd188b0 error:0

Modules linked in: isofs zlib_inflate rfcomm l2cap bluetooth snd_seq_dummy
snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_emu10k1
snd_rawmidi snd_seq_device snd_ac97_codec snd_pcm snd_timer snd_page_alloc
snd_util_mem snd_hwdep snd soundcore i2c_isa i2c_viapro usbhid eth1394 tg3
ide_cd cdrom ohci1394 ieee1394 loop nls_iso8859_1 nls_cp437 vfat fat tuner bttv
video_buf firmware_class i2c_algo_bit v4l2_common btcx_risc tveeprom i2c_core
videodev sata_via libata scsi_mod ehci_hcd uhci_hcd usbcore video thermal
processor hotkey fan button ac
Pid: 10100, comm: sed Tainted: G   M  2.6.13-rc5
RIP: 0033:[<000000000040870a>] [<000000000040870a>]
RSP: 002b:00007fffffd188b0  EFLAGS: 00010213
RAX: 0000000000000000 RBX: 0073726f74632e00 RCX: 5b2f257300652d00
RDX: 0000000000000001 RSI: 0000000000000031 RDI: 0073726f74632e00
RBP: 0000000000000031 R08: fefefefefefefeff R09: 6565656565656565
R10: 00002aaaaabc09e8 R11: 00002aaaaac30880 R12: 0000000000000005
R13: 00007fffffd18ad0 R14: 616c65722e00725f R15: 0000000000000000
FS:  00002aaaaadf2b00(0000) GS:ffffffff80574800(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000004086f0 CR3: 0000000049fdc000 CR4: 00000000000006e0
 48 8b 47 08 48 39 07 74 23 48 8b 53 10 48 01 c2 48 ff c0 48
/proc/$$/maps:
00400000-0041b000 r-xp 00000000 03:06 5668924 /bin/sed
0051a000-0051b000 rw-p 0001a000 03:06 5668924 /bin/sed
0051b000-00544000 rw-p 0051b000 03:06 5668924 [heap]
2aaaaaaab000-2aaaaaabf000 r-xp 00000000 03:06 11911179 /lib64/ld-2.3.4.so
2aaaaaabf000-2aaaaaac0000 rw-p 2aaaaaabf000 03:06 11911179
2aaaaaac0000-2aaaaaac1000 r--p 00000000 03:06 7389976
/usr/share/locale/en_US/LC_TELEPHONE
2aaaaaac1000-2aaaaaac2000 r--p 00000000 03:06 7389937
/usr/share/locale/en_US/LC_ADDRESS
2aaaaaac2000-2aaaaaac3000 r--p 00000000 03:06 7389986
/usr/share/locale/en_US/LC_PAPER
2aaaaaac3000-2aaaaaac4000 r--p 00000000 03:06 7390161
/usr/share/locale/en_US/LC_MONETARY
2aaaaaac4000-2aaaaaac5000 r--p 00000000 03:06 7389930
/usr/share/locale/en_US/LC_NUMERIC
2aaaaabbf000-2aaaaabc0000 r--p 00014000 03:06 11911179 /lib64/ld-2.3.4.so
2aaaaabc0000-2aaaaabc1000 rw-p 00015000 03:06 11911179 /lib64/ld-2.3.4.so
2aaaaabc1000-2aaaaace9000 r-xp 00000000 03:06 11911311 /lib64/tls/libc-2.3.4.so
2aaaaace9000-2aaaaade8000 ---p 00128000 03:06 11911311 /lib64/tls/libc-2.3.4.so
2aaaaade8000-2aaaaadeb000 r--p 00127000 03:06 11911311 /lib64/tls/libc-2.3.4.so
2aaaaadeb000-2aaaaadee000 rw-p 0012a000 03:06 11911311 /lib64/tls/libc-2.3.4.so
2aaaaadee000-2aaaaadf4000 rw-p 2aaaaadee000 03:06 11911311
7fffffd06000-7fffffd1c000 rw-p 7fffffd06000 03:06 11911311 [stack]

gcc[32622] general protection rip:404498 rsp:7ffffff172a0 error:0

Modules linked in: isofs zlib_inflate rfcomm l2cap bluetooth snd_seq_dummy
snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_emu10k1
snd_rawmidi snd_seq_device snd_ac97_codec snd_pcm snd_timer snd_page_alloc
snd_util_mem snd_hwdep snd soundcore i2c_isa i2c_viapro usbhid eth1394 tg3
ide_cd cdrom ohci1394 ieee1394 loop nls_iso8859_1 nls_cp437 vfat fat tuner bttv
video_buf firmware_class i2c_algo_bit v4l2_common btcx_risc tveeprom i2c_core
videodev sata_via libata scsi_mod ehci_hcd uhci_hcd usbcore video thermal
processor hotkey fan button ac
Pid: 32622, comm: gcc Tainted: G   M  2.6.13-rc5
RIP: 0033:[<0000000000404498>] [<0000000000404498>]
RSP: 002b:00007ffffff172a0  EFLAGS: 00010202
RAX: 0000000000000000 RBX: 000000000051b6b8 RCX: 000000000051b5d0
RDX: 000000000051b5b0 RSI: 00002aaaaadee808 RDI: 0000000001000000
RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
R10: 00002aaaaadee620 R11: 0000000001000010 R12: 000000000051b500
R13: 00000000005171e0 R14: 7478657400746f68 R15: 0000000000000001
FS:  00002aaaaadf2b00(0000) GS:ffffffff80574880(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002aaaaac21c90 CR3: 0000000042a51000 CR4: 00000000000006e0
 41 c7 06 00 00 00 00 eb c4 66 66 66 90 66 66 66 90 66 66 66
/proc/$$/maps:
00400000-00417000 r-xp 00000000 03:06 7375679 /usr/bin/gcc-3.4.3
00516000-00518000 rw-p 00016000 03:06 7375679 /usr/bin/gcc-3.4.3
00518000-00539000 rw-p 00518000 03:06 7375679 [heap]
2aaaaaaab000-2aaaaaabf000 r-xp 00000000 03:06 11911179 /lib64/ld-2.3.4.so
2aaaaaabf000-2aaaaaac0000 rw-p 2aaaaaabf000 03:06 11911179
2aaaaabbf000-2aaaaabc0000 r--p 00014000 03:06 11911179 /lib64/ld-2.3.4.so
2aaaaabc0000-2aaaaabc1000 rw-p 00015000 03:06 11911179 /lib64/ld-2.3.4.so
2aaaaabc1000-2aaaaace9000 r-xp 00000000 03:06 11911311 /lib64/tls/libc-2.3.4.so
2aaaaace9000-2aaaaade8000 ---p 00128000 03:06 11911311 /lib64/tls/libc-2.3.4.so
2aaaaade8000-2aaaaadeb000 r--p 00127000 03:06 11911311 /lib64/tls/libc-2.3.4.so
2aaaaadeb000-2aaaaadee000 rw-p 0012a000 03:06 11911311 /lib64/tls/libc-2.3.4.so
2aaaaadee000-2aaaaadf4000 rw-p 2aaaaadee000 03:06 11911311
7ffffff03000-7ffffff19000 rw-p 7ffffff03000 03:06 11911311 [stack]
Comment 66 Bongani Hlope 2005-08-11 12:48:46 UTC
My x86-64 assembly sucks, so here goes (so does my x86 ;)

objdump -Sr /usr/bin/gcc the search for 404498 (that's where RIP was pointing)

  404463:       75 33                   jne    404498 <strcat@plt+0x2c60>
  404465:       48 8b 03                mov    (%rbx),%rax
  404468:       48 89 42 08             mov    %rax,0x8(%rdx)
  40446c:       48 89 13                mov    %rdx,(%rbx)
  40446f:       48 8b 5c 24 08          mov    0x8(%rsp),%rbx
  404474:       48 8b 6c 24 10          mov    0x10(%rsp),%rbp
  404479:       4c 8b 64 24 18          mov    0x18(%rsp),%r12
  40447e:       4c 8b 6c 24 20          mov    0x20(%rsp),%r13
  404483:       4c 8b 74 24 28          mov    0x28(%rsp),%r14
  404488:       4c 8b 7c 24 30          mov    0x30(%rsp),%r15
  40448d:       48 83 c4 38             add    $0x38,%rsp
  404491:       c3                      retq
  404492:       41 89 45 08             mov    %eax,0x8(%r13)
  404496:       eb a6                   jmp    40443e <strcat@plt+0x2c06>
->  404498:       41 c7 06 00 00 00 00    movl   $0x0,(%r14)
  40449f:       eb c4                   jmp    404465 <strcat@plt+0x2c2d>
  4044a1:       66                      data16
  4044a2:       66                      data16
  4044a3:       66                      data16
  4044a4:       90                      nop
  4044a5:       66                      data16
  4044a6:       66                      data16
  4044a7:       66                      data16

according to the maps dump I just posted (thanx for the patch Chris)
r13 has this value: R14: 7478657400746f68

-----

The sed dump (I have three of them and all happen on the same RIP)

  4086f0:       48 89 5c 24 f0          mov    %rbx,0xfffffffffffffff0(%rsp)
  4086f5:       48 89 6c 24 f8          mov    %rbp,0xfffffffffffffff8(%rsp)
  4086fa:       31 c0                   xor    %eax,%eax
  4086fc:       48 83 ec 18             sub    $0x18,%rsp
  408700:       83 fe ff                cmp    $0xffffffffffffffff,%esi
  408703:       48 89 fb                mov    %rdi,%rbx
  408706:       89 f5                   mov    %esi,%ebp
  408708:       74 1d                   je     408727 <getopt_long@plt+0x665f>
-> 40870a:       48 8b 47 08             mov    0x8(%rdi),%rax
  40870e:       48 39 07                cmp    %rax,(%rdi)
  408711:       74 23                   je     408736 <getopt_long@plt+0x666e>

The sed dumps all have the following values
RDI: 0073726f74632e00 RAX: 0000000000000000

--

Oh and the ar dumps is inside /usr/lib64/libbfd-2.15.92.0.2.so and not ld.so
Comment 67 PaX Team 2005-08-11 15:49:29 UTC
re: comment #65 and #66: it's not clear to me that your problem is the same as
Chris Caputo's. in your case the invalid memory accesses are at addresses that
are clearly ascii strings, not 'random' garbage as is the case with Chris. that
is, it looks like there may be some kind of memory management bug in
ar/gcc/libbfd (buffer overflow, stale pointer access, etc). if you can isolate
the full command line that was running when the bug triggered, you could run it
through valgrind 3.0 repeatedly to see if it finds anything suspicious,
especially for a case when the GPF is triggered as well. of course this may in
fact point to the actual bug as well, like we know what the 'bad' page contained
(ascii strings, presumably produced by the same or same kind of application as
the strings are section names).
Comment 68 Bongani Hlope 2005-08-12 10:56:23 UTC
Comment on attachment 5590 [details]
Dump maps of failing program this is againt 2.6.13-rc5

Use Chris' patch
Comment 69 Bongani Hlope 2005-08-12 11:12:21 UTC
Can someone please verify that this code correctly gets the environement and
command line argument for a task.

static void show_env(void)
{
        struct task_struct *task = current;
        char *cmdline;
        char *env;
        int len;
        struct mm_struct *mm = get_task_mm(task);

        cmdline = kmalloc(PAGE_SIZE, GFP_KERNEL);
        env = kmalloc(PAGE_SIZE, GFP_KERNEL);

        if(!cmdline)
                goto out;

        if(!env)
                goto out;

        if(!mm)
                goto out;

        memset(cmdline, 0, PAGE_SIZE);
        memset(env, 0, PAGE_SIZE);

        len = mm->arg_end - mm->arg_start;
        if(len >= PAGE_SIZE)
                len = PAGE_SIZE -1;

        access_process_vm(task, mm->arg_start, cmdline, len, 0);

        len = mm->env_end - mm->env_start;
        if(len >= PAGE_SIZE)
                len = PAGE_SIZE - 1;

        access_process_vm(task, mm->env_start, env, len, 0);
        printk("%s\n%s\n", env, cmdline);
out:
        kfree(cmdline);
        kfree(env);
}

If this is correct, then there is something wrong with how they are built inside
the kernel. 

For a crashing sed, it prints:
LESSKEY=/etc/.less
sed
For a crashing gcc nothing is printed.
Comment 70 Bongani Hlope 2005-08-16 14:39:13 UTC
Its seem like something else is causing programs to segfault, I've been tryig to
build gcc (all of gcc) with debugging symbols but....

Kernel 2.6.12 and 2.6.13-rc5 just hardlock while compiling. This with
randomize_va_space set to zero. Kernel 2.6.11 fails and throws these errors

libstdc++.so.6.[26058]: segfault at 0000000000000001 rip 0000000000000001 rsp
00007ffffffff228 error 14
libstdc++.so.6.[21210]: segfault at 0000000000000001 rip 0000000000000001 rsp
00007ffffffff228 error 14
atkbd.c: Keyboard on isa0060/serio0 reports too many keys pressed.
cls_1_1byte.exe[24190]: segfault at 00007f000000000a rip 00002aaaaabc33a4 rsp
00007fffffffc6e0 error 4
cls_24byte.exe[24256]: segfault at 0000000000000000 rip 0000000000400acb rsp
00007fffffffc4a0 error 4
cls_64byte.exe[24419]: segfault at 0000000000000000 rip 0000000000400abe rsp
00007fffffffc220 error 4
nested_struct.e[25219]: segfault at 0000000000000000 rip 0000000000400ad2 rsp
00007fffffffc460 error 4
nxagent[2069] general protection rip:460d46 rsp:7ffffffff970 error:0
nxagent[17660] general protection rip:460d46 rsp:7ffffffff970 error:0
nxagent[4548] general protection rip:460d46 rsp:7ffffffff970 error:0
libstdc++.so.6.[19376]: segfault at 0000000000000001 rip 0000000000000001 rsp
00007ffffffff228 error 14

kernel 2.6.10 fails with these errors

cls_1_1byte.exe[8848]: segfault at 000000000000000a rip 0000002a9566e3a4 rsp
0000007fbfffc6e0 error 4
cls_24byte.exe[8879]: segfault at 0000000000000000 rip 0000000000400acb rsp
0000007fbfffc4a0 error 4
cls_64byte.exe[9004]: segfault at 0000000000000000 rip 0000000000400abe rsp
0000007fbfffc220 error 4
nested_struct.e[9777]: segfault at 0000000000000000 rip 0000000000400ad2 rsp
0000007fbfffc460 error 4
libstdc++.so.6.[590]: segfault at 0000000000000001 rip 0000000000000001 rsp
0000007fbffff228 error 14

I'll try to upgrade my userpace to Mandrake 2006 beta and see if this goes away
(this will take a while on my 56k connection), after that I'll start ripping
component out of my PC to see if it's a hardware bug.
Comment 71 Phillip Hoerter 2005-08-21 16:07:29 UTC
This is from the GIMP repro on a stock kernel 2.6.13-rc6 + Bongani Hlope's traps
patch.

Aug 21 18:03:04 gondor sed[24899] general protection rip:40870a rsp:7fffff918680
error:0
Aug 21 18:03:04 gondor 
Aug 21 18:03:04 gondor Modules linked in: nvidia w83627hf i2c_sensor i2c_isa
i2c_viapro i2c_core snd_seq_midi snd_emu10k1_synth snd_emux_synth
snd_seq_virmidi snd_seq_midi_event snd_seq_midi_emul snd_seq snd_pcm_oss
snd_mixer_oss snd_emu10k1 snd_rawmidi snd_seq_device snd_ac97_codec snd_pcm
snd_timer snd_page_alloc snd_util_mem snd_hwdep snd soundcore ehci_hcd uhci_hcd
usbcore evdev
Aug 21 18:03:04 gondor Pid: 24899, comm: sed Tainted: P      2.6.13-rc6
Aug 21 18:03:04 gondor RIP: 0033:[<000000000040870a>] [<000000000040870a>]
Aug 21 18:03:04 gondor RSP: 002b:00007fffff918680  EFLAGS: 00010213
Aug 21 18:03:04 gondor RAX: 0000000000000000 RBX: 342e342e332d7073 RCX:
7300652d002f2f58
Aug 21 18:03:04 gondor RDX: 0000000000000001 RSI: 0000000000000031 RDI:
342e342e332d7073
Aug 21 18:03:04 gondor RBP: 0000000000000031 R08: fefefefefefefeff R09:
6565656565656565
Aug 21 18:03:04 gondor R10: 00002aaaaabc09a8 R11: 00002aaaaac2f230 R12:
0000000000000005
Aug 21 18:03:04 gondor R13: 00007fffff9188a0 R14: 342e33206f6f746e R15:
0000000000000000
Aug 21 18:03:04 gondor FS:  00002aaaaade6ae0(0000) GS:ffffffff80380800(0000)
knlGS:0000000056ed8bb0
Aug 21 18:03:04 gondor CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Aug 21 18:03:04 gondor CR2: 00000000004086f0 CR3: 0000000019875000 CR4:
00000000000006a0
Aug 21 18:03:04 gondor 48 8b 47 08 48 39 07 74 23 48 8b 53 10 48 01 c2 48 ff c0 48
Aug 21 18:03:04 gondor /proc/$$/maps:
Aug 21 18:03:04 gondor 00400000-0041b000 r-xp 00000000 08:03 1179722 /bin/sed
Aug 21 18:03:04 gondor 0051a000-0051b000 rw-p 0001a000 08:03 1179722 /bin/sed
Aug 21 18:03:04 gondor 0051b000-00544000 rw-p 0051b000 08:03 1179722 [heap]
Aug 21 18:03:04 gondor 2aaaaaaab000-2aaaaaac0000 r-xp 00000000 08:03 880828
/lib64/ld-2.3.5.so
Aug 21 18:03:04 gondor 2aaaaaac0000-2aaaaaac1000 rw-p 2aaaaaac0000 08:03 880828 
Aug 21 18:03:04 gondor 2aaaaabbf000-2aaaaabc0000 r--p 00014000 08:03 880828
/lib64/ld-2.3.5.so
Aug 21 18:03:04 gondor 2aaaaabc0000-2aaaaabc1000 rw-p 00015000 08:03 880828
/lib64/ld-2.3.5.so
Aug 21 18:03:04 gondor 2aaaaabc1000-2aaaaacdd000 r-xp 00000000 08:03 880850
/lib64/libc-2.3.5.so
Aug 21 18:03:04 gondor 2aaaaacdd000-2aaaaaddc000 ---p 0011c000 08:03 880850
/lib64/libc-2.3.5.so
Aug 21 18:03:04 gondor 2aaaaaddc000-2aaaaaddf000 r--p 0011b000 08:03 880850
/lib64/libc-2.3.5.so
Aug 21 18:03:04 gondor 2aaaaaddf000-2aaaaade2000 rw-p 0011e000 08:03 880850
/lib64/libc-2.3.5.so
Aug 21 18:03:04 gondor 2aaaaade2000-2aaaaade8000 rw-p 2aaaaade2000 08:03 880850 
Aug 21 18:03:04 gondor 7fffff906000-7fffff91c000 rw-p 7fffff906000 08:03 880850
[stack]
Aug 21 18:03:04 gondor 


Hope this helps.

# cat /proc/sys/kernel/randomize_va_space 
1
Comment 72 Jules Colding 2005-08-25 01:04:09 UTC
Replying to my own comment #51... I have to add that the segfault happens even
though that I have put 0 (zero) into "/proc/sys/kernel/randomize_va_space", but
most more rarely. I got this in dmesg today when doing "make install" for some
package:

### snip ###
[81394.522481] sed[30821] general protection rip:40870a rsp:7fffffb18070 error:0
[84745.026843] mkdir[18152]: segfault at 0000000000000000 rip 000000000040184d
rsp 00007fffffffdd20 error 4
Comment 73 Tim Weippert 2005-09-02 03:10:01 UTC
I can confirm that on my machine (SUN V20z -> Dual Opteron 248) the setting
makes disapper the general protection messages.

I run 2.6.13 (vanilla).
Comment 74 Andrew Walrond 2005-09-02 08:05:13 UTC
I have just found this thread after a week of pulling my hair out after a 
simple kernel upgrade. My input (for what its worth):  
 Tyan K8W, dual opteron 250, NUMA,2Gb/cpu 
 All vanilla kernels <= 2.6.11.12 OK 
 All vanilla kernels >= 2.6.12 (including 2.6.13) BROKEN 
 
My test: Building gcc 3.3.6 from local terminal. Typical GPFs: 
log-2005-09-01-22:54:13:Sep  1 20:38:00 [kernel] sed[7623] general protection 
rip:404b75 rsp:7fffffd1a910 error:0 
log-2005-09-01-22:54:13:Sep  1 21:40:28 [kernel] sed[8603] general protection 
rip:404b75 rsp:7ffffff191d0 error:0 
 
I assume disabling randomize_va_space will fix it although I have yet to try. 
Comment 75 Arjan van de Ven 2005-09-02 08:10:28 UTC
On Fri, Sep 02, 2005 at 08:05:42AM -0700, bugme-daemon@kernel-bugs.osdl.org wrote:
> ------- Additional Comments From andrew@walrond.org  2005-09-02 08:05 -------
> I have just found this thread after a week of pulling my hair out after a 
> simple kernel upgrade. My input (for what its worth):  
>  Tyan K8W, dual opteron 250, NUMA,2Gb/cpu 
>  All vanilla kernels <= 2.6.11.12 OK 
>  All vanilla kernels >= 2.6.12 (including 2.6.13) BROKEN 

please mention your distro
(if gentoo, are you using that preload lib?)

Comment 76 Andrew Walrond 2005-09-02 10:12:42 UTC
Distro is Heretix (http://www.h-e-r-e-t-i-x.org). Very simple from source 
affair, nothing fancy at all; no preload lib, totally vanilla kernel(s), 
glibc-2-3-branch (20050826,nptl), binutils-2-16, gcc-3.4.4. Totally reliable in 
all respects with linux < 2.6.12. 
 
I can also now confirm that, as expected, 2.6.13 seems fine with 
randomize_va_space disabled. (gcc just built twice without any problems; I have 
previously been unable to build once despite extensive trys with kernels 
2.6.12.* - 2.6.13). 
 
Given the week I just had isolating this problem, I'm happy to help squash 
it :) If there is anything more I can do, let me know. 
Comment 77 Andrew Walrond 2005-09-02 12:26:13 UTC
Scratch that. After more extensive build tests (of a new distro) I suddenly got 
loads of: 
 
Sep  2 18:40:05 [kernel] as[11060]: segfault at 0000000000000000 rip 
00000000004001a0 rsp 00007fffffffe570 error 6 
Sep  2 18:40:06 [kernel] as[11954]: segfault at 0000000000000000 rip 
00000000004001a0 rsp 00007fffffffe570 error 6 
Sep  2 18:40:15 [kernel] as[16698]: segfault at 0000000000000000 rip 
00000000004001a0 rsp 00007fffffffd4f0 error 6 
 
while building binutils. This is vanilla untainted 2.6.13 with 
$ cat /proc/sys/kernel/randomize_va_space 
0 
 
I guess I'm stuck with 2.6.11.12 until this is resolved :( 
Comment 78 Andrew Walrond 2005-09-02 13:41:51 UTC
Since I have definite "2.6.11.12 works and 2.6.12 doesn't" situation, and a 
reliable test, perhaps a binary search/reversion of patches from 2.6.12 -> 
2.6.11.12 would be useful? I haven't gotten round to using git yet, but I guess 
it would make the process relatively painless? 
 
Alternatively, perhaps you guys have some likely culprits in mind that I could 
revert? 
Comment 79 Andi Kleen 2005-09-02 13:56:23 UTC
Prime suspect is randomize_va_mappings

And if it's that it's likely buggy user space too, not necessarily a kernel bug.
Comment 80 Andrew Walrond 2005-09-02 14:09:57 UTC
But since it still occurs with randomize_va_mappings disabled, (echo 0 >) isn't 
this unlikely? 
 
I guess applying the randomize_va_mappings patch to 2.6.11.12 and testing that 
might be a quick route to some answers. Can someone with the relevent git setup 
extract and send me the relevent patch? 
 
Comment 81 Arjan van de Ven 2005-09-02 23:17:34 UTC
On Fri, Sep 02, 2005 at 02:10:06PM -0700, bugme-daemon@kernel-bugs.osdl.org wrote:
> But since it still occurs with randomize_va_mappings disabled, (echo 0 >) isn't 
> this unlikely? 

correct

  
> I guess applying the randomize_va_mappings patch to 2.6.11.12 and testing that 
> might be a quick route to some answers. Can someone with the relevent git setup 
> extract and send me the relevent patch? 

http://www.kernel.org/pub/linux/kernel/people/arjan/randomize/

has them broken out

another suspect would be 4 level pagetables; that got merged about the same
time.

Comment 82 Bongani Hlope 2005-09-03 04:20:34 UTC
Sorry for the big post. I just did a quick trace and haven't analized any of
this yet.

sed[23074] general protection rip:40870a rsp:7fffffb18510 error:0
Env start: 7fffffb1a543, Args start : 7fffffb1a520
LC_PAPER=en_US
/bin/sed

Modules linked in: isofs zlib_inflate rfcomm l2cap bluetooth snd_seq_dummy
snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_emu10k1
snd_rawmidi snd_seq_device snd_ac97_codec snd_pcm snd_timer snd_page_alloc
snd_util_mem snd_hwdep snd soundcore i2c_isa i2c_viapro usbhid eth1394 tg3
ide_cd cdrom ohci1394 ieee1394 loop nls_iso8859_1 nls_cp437 vfat fat tuner bttv
video_buf firmware_class i2c_algo_bit v4l2_common btcx_risc tveeprom i2c_core
videodev sata_via libata scsi_mod ehci_hcd uhci_hcd usbcore video thermal
processor hotkey fan button ac
Pid: 23074, comm: sed Tainted: G   M  2.6.13-rc5
RIP: 0033:[<000000000040870a>] [<000000000040870a>]
RSP: 002b:00007fffffb18510  EFLAGS: 00010213
RAX: 0000000000000000 RBX: 554e4728203a4343 RCX: 7300652d002f2f58
RDX: 0000000000000001 RSI: 0000000000000031 RDI: 554e4728203a4343
RBP: 0000000000000031 R08: fefefefefefefeff R09: 6565656565656565
R10: 00002aaaaabc09e8 R11: 00002aaaaac30880 R12: 0000000000000005
R13: 00007fffffb18730 R14: 20322e3031207875 R15: 0000000000000000
FS:  00002aaaaadf2b00(0000) GS:ffffffff80574800(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000004086f0 CR3: 000000002aebe000 CR4: 00000000000006e0
 48 8b 47 08 48 39 07 74 23 48 8b 53 10 48 01 c2 48 ff c0 48
/proc/$$/maps:
00400000-0041b000 r-xp 00000000 03:06 5669023 /bin/sed
0051a000-0051b000 rw-p 0001a000 03:06 5669023 /bin/sed
0051b000-00544000 rw-p 0051b000 03:06 5669023 [heap]
2aaaaaaab000-2aaaaaabf000 r-xp 00000000 03:06 11911179 /lib64/ld-2.3.4.so
2aaaaaabf000-2aaaaaac0000 rw-p 2aaaaaabf000 03:06 11911179
2aaaaaac0000-2aaaaaac1000 r--p 00000000 03:06 7389976
/usr/share/locale/en_US/LC_TELEPHONE
2aaaaaac1000-2aaaaaac2000 r--p 00000000 03:06 7389937
/usr/share/locale/en_US/LC_ADDRESS
2aaaaaac2000-2aaaaaac3000 r--p 00000000 03:06 7389986
/usr/share/locale/en_US/LC_PAPER
2aaaaaac3000-2aaaaaac4000 r--p 00000000 03:06 7390161
/usr/share/locale/en_US/LC_MONETARY
2aaaaaac4000-2aaaaaac5000 r--p 00000000 03:06 7389930
/usr/share/locale/en_US/LC_NUMERIC
2aaaaabbf000-2aaaaabc0000 r--p 00014000 03:06 11911179 /lib64/ld-2.3.4.so
2aaaaabc0000-2aaaaabc1000 rw-p 00015000 03:06 11911179 /lib64/ld-2.3.4.so
2aaaaabc1000-2aaaaace9000 r-xp 00000000 03:06 11911311 /lib64/tls/libc-2.3.4.so
2aaaaace9000-2aaaaade8000 ---p 00128000 03:06 11911311 /lib64/tls/libc-2.3.4.so
2aaaaade8000-2aaaaadeb000 r--p 00127000 03:06 11911311 /lib64/tls/libc-2.3.4.so
2aaaaadeb000-2aaaaadee000 rw-p 0012a000 03:06 11911311 /lib64/tls/libc-2.3.4.so
2aaaaadee000-2aaaaadf4000 rw-p 2aaaaadee000 03:06 11911311
7fffffb06000-7fffffb1c000 rw-p 7fffffb06000 03:06 11911311 [stack]

[bongani@bongani64 kdelibs]$ gdb /bin/sed -c core.23074
GNU gdb 6.3-3.1.102mdk (Mandrakelinux)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-mandrake-linux-gnu"...Using host libthread_db
library "/lib64/tls/libthread_db.so.1".

Core was generated by `/bin/sed -e 1s/^X// -e s%/[^/]*$%%'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /lib64/tls/libc.so.6...done.
Loaded symbols for /lib64/tls/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
#0  add1_buffer (b=0x554e4728203a4343, c=49) at utils.c:503
503           if (b->allocated - b->length < 1)
(gdb) bt
#0  add1_buffer (b=0x554e4728203a4343, c=49) at utils.c:503
#1  0x00000000004028d9 in add_then_next (b=0x554e4728203a4343, ch=49) at
compile.c:300
#2  0x0000000000403676 in read_text (buf=0x0, leadin_ch=49) at compile.c:904
#3  0x0000000000404129 in compile_program (vector=0x20322e3031207875) at
compile.c:1036
#4  0x000000000040495a in compile_string (cur_program=0x554e4728203a4343,
str=0x31 <Address 0x31 out of bounds>, len=1)
    at compile.c:1568
#5  0x00000000004025e1 in main (argc=5, argv=0x7fffffb18738) at sed.c:212
(gdb)                                                                      


static void read_text P_((struct text_buf *buf, int leadin_ch));
static void
read_text(buf, leadin_ch)
  struct text_buf *buf;
  int leadin_ch;
{
  int ch;

  /* Should we start afresh (as opposed to continue a partial text)? */
  if (buf)
    {
      if (pending_text)
        free_buffer(pending_text);
      pending_text = init_buffer();
      buf->text = NULL;
      buf->text_length = 0;
      old_text_buf = buf;
    }
  /* assert(old_text_buf != NULL); */

  if (leadin_ch == EOF)
    return;

  if (leadin_ch != '\n')
    add1_buffer(pending_text, leadin_ch);

  ch = inchar();
  while (ch != EOF && ch != '\n')
    {
      if (ch == '\\')
        {
          ch = inchar();
          if (ch != EOF)
            add1_buffer (pending_text, '\\');
        }

      if (ch == EOF)
        {
          add1_buffer (pending_text, '\n');
          return;
        }

      ch = add_then_next (pending_text, ch); <------- compile.c:904

-------- sed.c -------------------
int
main(argc, argv)
  int argc;
  char **argv;
{
#ifdef REG_PERL
#define SHORTOPTS "snrRue:f:l:i::V:"
#else
#define SHORTOPTS "snrue:f:l:i::V:"
#endif

  static struct option longopts[] = {
    {"regexp-extended", 0, NULL, 'r'},
#ifdef REG_PERL
    {"regexp-perl", 0, NULL, 'R'},
#endif
    {"expression", 1, NULL, 'e'},
    {"file", 1, NULL, 'f'},
    {"in-place", 2, NULL, 'i'},
    {"line-length", 1, NULL, 'l'},
    {"quiet", 0, NULL, 'n'},
    {"posix", 0, NULL, 'p'},
    {"silent", 0, NULL, 'n'},
    {"separate", 0, NULL, 's'},
    {"unbuffered", 0, NULL, 'u'},
    {"version", 0, NULL, 'v'},
    {"help", 0, NULL, 'h'},
    {NULL, 0, NULL, 0}
  };

  int opt;
  int return_code;
  const char *cols = getenv("COLS");

  initialize_main (&argc, &argv);
#if HAVE_SETLOCALE
  /* Set locale according to user's wishes.  */
  setlocale (LC_ALL, "");
#endif
  initialize_mbcs ();

#if ENABLE_NLS

  /* Tell program which translations to use and where to find.  */
  bindtextdomain (PACKAGE, LOCALEDIR);
  textdomain (PACKAGE);
#endif

  if (getenv("POSIXLY_CORRECT") != NULL)
    posixicity = POSIXLY_CORRECT;
  else
    posixicity = POSIXLY_EXTENDED;

  /* If environment variable `COLS' is set, use its value for
     the baseline setting of `lcmd_out_line_len'.  The "-1"
     is to avoid gratuitous auto-line-wrap on ttys.
   */
  if (cols)
    {
      countT t = ATOI(cols);
      if (t > 1)
        lcmd_out_line_len = t-1;
    }

  myname = *argv;
  while ((opt = getopt_long(argc, argv, SHORTOPTS, longopts, NULL)) != EOF)
    {
      switch (opt)
        {
        case 'n':
          no_default_output = true;
          break;
        case 'e':
the_program = compile_string(the_program, optarg, strlen(optarg)); <--sed.c:212 
    
-------- compile.c ----------

/* `str' is a string (from the command line) that contains a sed command.
   Compile the command, and add it to the end of `cur_program'. */
struct vector *
compile_string(cur_program, str, len)
  struct vector *cur_program;
  char *str;
  size_t len;
{
  static countT string_expr_count = 0;
  struct vector *ret;

  prog.file = NULL;
  prog.base = CAST(unsigned char *)str;
  prog.cur = prog.base;
  prog.end = prog.cur + len;

  cur_input.line = 0;
  cur_input.name = NULL;
  cur_input.string_expr_count = ++string_expr_count;

  ret = compile_program(cur_program);  <----- compile.c:1568
  prog.base = NULL;
  prog.cur = NULL;
  prog.end = NULL;

  first_script = false;
  return ret;
}


/* Read a program (or a subprogram within `{' `}' pairs) in and store
   the compiled form in `*vector'.  Return a pointer to the new vector.  */
static struct vector *compile_program P_((struct vector *));
static struct vector *
compile_program(vector)
  struct vector *vector;
{
  struct sed_cmd *cur_cmd;
  struct buffer *b;
  int ch;

  if (!vector)
    {
      vector = MALLOC(1, struct vector);
      vector->v = NULL;
      vector->v_allocated = 0;
      vector->v_length = 0;

      obstack_init (&obs);
    }
  if (pending_text)
    read_text(NULL, '\n'); <------------ compile.c:1036
    

static int add_then_next P_((struct buffer *b, int ch));
static int
add_then_next(b, ch)
  struct buffer *b;
  int ch;
{
  add1_buffer(b, ch);  <------------- compile.c:300
  return inchar();
}

-------- utils.c ------------
char *
add1_buffer(b, c)
  struct buffer *b;
  int c;
{
  /* This special case should be kept cheap;
   *  don't make it just a mere convenience
   *  wrapper for add_buffer() -- even "builtin"
   *  versions of memcpy(a, b, 1) can become
   *  expensive when called too often.
   */
  if (c != EOF)
    {
      char *result;
      if (b->allocated - b->length < 1) <--- utils.c:503
        resize_buffer(b, b->length+1);
      result = b->b + b->length++;
      *result = c;
      return result;
    }

  return NULL;
}
Comment 83 Arjan van de Ven 2005-09-03 05:32:46 UTC
On Sat, Sep 03, 2005 at 04:20:48AM -0700, bugme-daemon@kernel-bugs.osdl.org wrote:
> Pid: 23074, comm: sed Tainted: G   M  2.6.13-rc5

you had a machine check first... what is that about? Does this happen as
well without machine check dor you?

Comment 84 Bongani Hlope 2005-09-03 13:26:05 UTC
That happens everytime I boot up (the Machine check)... about 4 seconds after 
the system is fully loaded. I don't know what it means though. I tried to 
search for it on google but nothing came up... It been happening for a long 
time.
Comment 85 Chris Caputo 2005-09-03 13:38:00 UTC
On Sat, 3 Sep 2005, bonganilinux@mweb.co.za wrote:
> That happens everytime I boot up (the Machine check)... about 4 seconds 
> after the system is fully loaded. I don't know what it means though. I 
> tried to search for it on google but nothing came up... It been 
> happening for a long time.

Install and run mcelog after you see one of these.

Chris

Comment 86 Bongani Hlope 2005-09-03 14:11:29 UTC
MCE log output

[root@bongani64 linux-2.6]# mcelog --k8
MCE 0
CPU 0 1 instruction cache from boot or resume
ADDR ff3fffffff3ffdaf
  Instruction cache ECC error
       bit32 = err cpu0
       bit33 = err cpu1
       bit34 = res2
       bit35 = res3
       bit39 = res7
       bit40 = error found by scrub
       bit41 = res9
       bit42 = res10
       bit43 = res11
       bit44 = res12
       bit45 = uncorrected ecc error
       bit46 = corrected ecc error
       bit57 = processor context corrupt
       bit59 = misc error valid
       bit61 = error uncorrected
       bit62 = error overflow (multiple errors)
STATUS fe37ffbfff3fffff MCGSTATUS 0
MCE 1
CPU 0 2 bus unit from boot or resume
  L2 cache ECC error
  Bus or cache array error
       bit46 = corrected ecc error
       bit61 = error uncorrected
       bit62 = error overflow (multiple errors)
STATUS f00040000000c8ff MCGSTATUS 0
MCE 2
CPU 0 3 load/store unit from boot or resume
MISC 8005003b8005003b ADDR ffb6fdfbff
       bit59 = misc error valid
       bit61 = error uncorrected
STATUS bc0000000000c843 MCGSTATUS 0
MCE 3
CPU 1 0 data cache from boot or resume
ADDR 7e5041006cc
  Data cache ECC error (syndrome f2) found by scrubber
       bit40 = error found by scrub
       bit45 = uncorrected ecc error
       bit46 = corrected ecc error
       bit57 = processor context corrupt
       bit59 = misc error valid
       bit61 = error uncorrected
       bit62 = error overflow (multiple errors)
STATUS fe796100000002ea MCGSTATUS 0
MCE 4
CPU 1 1 instruction cache from boot or resume
ADDR fbffcff8ffffffff
  Instruction cache ECC error
       bit32 = err cpu0
       bit33 = err cpu1
       bit34 = res2
       bit35 = res3
       bit39 = res7
       bit40 = error found by scrub
       bit41 = res9
       bit42 = res10
       bit43 = res11
       bit44 = res12
       bit45 = uncorrected ecc error
       bit46 = corrected ecc error
       bit55 = res23
       bit56 = res24
       bit57 = processor context corrupt
       bit59 = misc error valid
       bit61 = error uncorrected
       bit62 = error overflow (multiple errors)
STATUS ffffffffffffffff MCGSTATUS 0
MCE 5
CPU 1 2 bus unit from boot or resume
ADDR d3f9ffe98b
  L2 cache ECC error
  Cache tag array error
       bit46 = corrected ecc error
       bit57 = processor context corrupt
       bit61 = error uncorrected
STATUS a600400000033dbe MCGSTATUS 0
MCE 6
CPU 1 3 load/store unit from boot or resume
MISC 8005003b8005003b ADDR ac86a04594
       bit57 = processor context corrupt
       bit59 = misc error valid
       bit61 = error uncorrected
       bit62 = error overflow (multiple errors)
STATUS fe0000000000dccc MCGSTATUS 0
Comment 87 Andrew Walrond 2005-09-05 01:51:53 UTC
Ok; some recap, and some new results:   
System: Tyan K8W dual opteron, 64bit NUMA kernels, 2Gb/cpu.  
Symptoms: Random userland crashes during build of gcc-3.3  
All (vanilla, untainted) kernels <= 2.6.11.12 are entirely reliable  
  
I have now tested a vanilla 2.6.11.12 kernel, modified ONLY with Arjans  
randomize_va_space patches. Results:  
 
With /proc/sys/kernel/randomize_va_space = 1:  
 Two failures on two separate gcc builds; 
 Sep  4 21:55:19 [kernel] cc1[15562] general protection rip:2aaaaaaac244 
rsp:7ffffffbd990 error:0 
 Sep  5 09:04:12 [kernel] cc1[23242] general protection rip:2aaaaaaac244 
rsp:7fffffdbd390 error:0 
 
With /proc/sys/kernel/randomize_va_space = 0: 
 No failures. This after 6 full gcc builds and a complete distro build from 
source. 
 
(Heretix allows distro builds with parallel package builds as well as parallel 
make jobs; --build-jobs=5 -make-jobs=4 is a very effect stress test) 
 
CONCLUSIONS 
1) 4 level pagetables are not responsible for this bug since they are not in 
this kernel 
2) As suspected, randomize_va_space is either directly causing the symptoms, or 
exposing a pre-existing problem 
 
I'm happy to test further, or make the test machine available via ssh. 
Comment 88 PaX Team 2005-09-05 17:18:22 UTC
actually, 4level page tables went into 2.6.11, not later and were enabled on
amd64. on another note, was anyone able to perform the suggested tests in
comment #63?
Comment 89 Andrew Walrond 2005-09-06 02:28:17 UTC
My bad. Previous comments lead me to believe 4level page tables went in during 
2.6.12.*. However, if 4level page tables are indeed in 2.6.11.12, which has 
been reliable on my systems since its release, I guess the conclusions still 
stand. 
 
As to the tests in #63; I'd be happy to apply and test any patches, but I don't 
have enough (any) kernel programming knowledge to generate said patch. 
Comment 90 Bongani Hlope 2005-09-06 14:51:17 UTC
Sorry for the big post again. I used this patch to test point 5 of comment #63.
This is the lifetime of a gcc call that eventualy GPFs. I hope this is helpful

--- mm/memory.c.orig    2005-09-06 20:38:44.000000000 +0200
+++ mm/memory.c 2005-09-06 22:49:58.000000000 +0200
@@ -1800,6 +1800,25 @@
        return VM_FAULT_OOM;
 }

+static void print_params(struct mm_struct *mm, unsigned long address,
+               pte_t page_table, pte_t entry) {
+       struct task_struct *tsk = current;
+       if(!strcmp("gcc", tsk->comm) || !strcmp("rm", tsk->comm)) {
+
+               printk(KERN_INFO "print_params: %s[%d]\n", tsk->comm, tsk->pid);
+               printk(KERN_INFO "mm_struct: start_code=%lx, end_code=%lx",
+                               mm->start_code, mm->end_code);
+               printk(KERN_INFO "start_data=%lx, end_data=%lx, start_brk=%lx",
+                               mm->start_data, mm->end_data, mm->start_brk);
+               printk(KERN_INFO "brk=%lx, start_stack=%lx, arg_start=%lx",
+                               mm->brk, mm->start_stack, mm->arg_start);
+               printk(KERN_INFO "arg_end=%lx, env_start=%lx, env_end=%lx\n",
+                               mm->arg_end, mm->env_start, mm->env_end);
+               printk(KERN_INFO "page_table=%lx, address=%lx, entry=%lx\n",
+                               pte_val(page_table), address, pte_val(entry));
+       }
+}
+
 /*
  * do_no_page() tries to create a new page mapping. It aggressively
  * tries to share with existing pages, but makes a separate copy if
@@ -1901,6 +1920,7 @@
                entry = mk_pte(new_page, vma->vm_page_prot);
                if (write_access)
                        entry = maybe_mkwrite(pte_mkdirty(entry), vma);
+               print_params(mm, address, *page_table, entry);
                set_pte_at(mm, address, page_table, entry);
                if (anon) {
                        lru_cache_add_active(new_page);



print_params: gcc[17689]
mm_struct:
 start_code=0
 end_code=0
 start_data=0
 end_data=0
 start_brk=518000
 brk=518000
 start_stack=7fffffb17df4
 arg_start=7fffffb17df4
 arg_end=0
 env_start=0
 env_end=0
page_table=0
address=5178c8
entry=800000003b5a4067


print_params: gcc[17689]
mm_struct:
 start_code=0
 end_code=0
 start_data=0
 end_data=0
 start_brk=518000
 brk=518000
 start_stack=7fffffb17df4
 arg_start=7fffffb17df4
 arg_end=0
 env_start=0
 env_end=0
page_table=0
address=2aaaaabc0868
entry=8000000063dc6067


print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaaaaba80
entry=7fa2d025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaaaac130
entry=7fa2e025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaabbffb8
entry=800000007f9e8025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaaab27b0
entry=7fa34025


print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaaab8a60
entry=7ee6d025


print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaaabb894
entry=7f0ae025


print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaaaba1c0
entry=7f0a7025


print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaaab9970
entry=7f0a6025


print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaaab3360
entry=7fe9c025


print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=400040
entry=6a284025


print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaaaad005
entry=7fa2f025


print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=516de0
entry=8000000069d42025


print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaaaaf540
entry=7fa31025


print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaaabc374
entry=7f0af025


print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaaab4750
entry=7f9c1025


print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaaab5a00
entry=7f9c6025


print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaaab1190
entry=7fa33025


print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaaab69c0
entry=7fe9e025


print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaaac0000
entry=800000007eea1025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaaac42e8
entry=800000007eea5025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaaac8624
entry=800000007eea9025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaaad4725
entry=800000007eec5025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaaaca7b4
entry=800000007eeab025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaaad8559
entry=800000007eec9025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaaacb87c
entry=800000007eeac025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaaada488
entry=800000007eecb025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaaad9426
entry=800000007eeca025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaaab08f0
entry=7fa32025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaaab7ce0
entry=7f060025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaadeda18
entry=8000000040c26067

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaadebaa0
entry=800000007eefb025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaabc1290
entry=7eee5025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaabd405c
entry=7ef1d025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaabd5058
entry=7ef1e025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaade8f10
entry=800000004cb37067

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaade9000
entry=80000000420f3067

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaabd6000
entry=7ef1f025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaabd7000
entry=7ef20025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaadea000
entry=800000004425b067

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaabd8008
entry=7ef21025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaabd9000
entry=7ef22025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaabda000
entry=7ef23025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaadec1e0
entry=8000000034bf2067

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaabdb008
entry=7ef24025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaabdc000
entry=7ef25025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaabcc8f4
entry=7ef15025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaabd2e9c
entry=7ef1b025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaabce878
entry=7ef17025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaabd3b71
entry=7ef1c025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaabc37b0
entry=7eee7025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaabc62dc
entry=7eeea025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaabd0457
entry=7ef19025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaabcd060
entry=7ef16025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaabcbab8
entry=7ef14025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaabc83c8
entry=7eeec025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaabd11cd
entry=7ef1a025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaabc26e8
entry=7eee6025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaabcf40e
entry=7ef18025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaabca408
entry=7ef13025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaabc4ea0
entry=7eee8025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaabc72d0
entry=7eeeb025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaabc9640
entry=7ef12025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaabc5004
entry=7eee9025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaabdd000
entry=7ef26025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=401010
entry=6a285025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaac851b0
entry=7ef3d025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaac30c90
entry=7ef5d025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaabf1c60
entry=7ef7a025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=403290
entry=6a287025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaabeefa0
entry=7ef77025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=40cc90
entry=6a100025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=402dd0
entry=6a286025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaabe6690
entry=7ef6f025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=413de6
entry=6a107025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaacc533a
entry=7e883025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaaccd692
entry=7e88b025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaacb8d60
entry=7e876025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaabe8860
entry=7ef71025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaac296b0
entry=7ef56025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaabe9710
entry=7ef72025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=4157ab
entry=69d41025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaac2b6e0
entry=7ef58025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaac2c7a0
entry=7ef59025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaac7e2f0
entry=7ef36025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaacb7290
entry=7e875025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaac90a90
entry=7ef48025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaac28850
entry=7ef55025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaac2a000
entry=7ef57025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaac2e520
entry=7ef5b025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=518000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaac7db00
entry=7ef35025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=539000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaac32200
entry=7ef5f025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=539000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaabec300
entry=7ef75025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=539000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaabef050
entry=7ef78025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=539000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaac31b00
entry=7ef5e025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=539000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaac2fb80
entry=7ef5c025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=539000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=4149c0
entry=69d40025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=539000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=40d638
entry=6a101025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=539000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=405040
entry=6a289025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=539000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaabfaaa0
entry=7ef83025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=539000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=4043d0
entry=6a288025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=539000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=40ef40
entry=6a102025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=539000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=40f002
entry=6a103025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=539000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaac81e70
entry=7ef39025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=539000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=406b14
entry=6a28a025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=539000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=407654
entry=6a28b025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=539000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaac08b20
entry=7e8d4025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=539000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaac002e0
entry=7ef89025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=539000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaac19890
entry=7e8a3025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=539000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaac268c0
entry=7ef53025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=539000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaacc730c
entry=7e885025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=539000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaac012d8
entry=7e8cd025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=539000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaac0391f
entry=7e8cf025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=539000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaac258e0
entry=7ef52025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=539000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaac24fa0
entry=7ef51025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=539000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaac78230
entry=7ef30025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=539000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=411e20
entry=6a105025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=539000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaac83480
entry=7ef3b025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=539000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaabea000
entry=7ef73025

print_params: gcc[17689]
mm_struct:
 start_code=400000
 end_code=416db4
 start_data=516db8
 end_data=5178c8
 start_brk=518000
 brk=539000
 start_stack=7fffffb15990
 arg_start=7fffffb17df4
 arg_end=7fffffb17e16
 env_start=7fffffb17e16
 env_end=7fffffb17e9a
page_table=0
address=2aaaaac21c90
entry=7ef4e025

gcc[17689] general protection rip:404498 rsp:7fffffb15710 error:0

Modules linked in: rfcomm l2cap bluetooth snd_seq_dummy snd_seq_oss
snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_emu10k1 snd_rawmidi
snd_seq_device snd_ac97_codec snd_pcm snd_timer snd_page_alloc snd_util_mem
snd_hwdep snd soundcore i2c_isa i2c_viapro usbhid eth1394 tg3 ide_cd cdrom
ohci1394 ieee1394 loop nls_iso8859_1 nls_cp437 vfat fat tuner bttv video_buf
firmware_class i2c_algo_bit v4l2_common btcx_risc tveeprom i2c_core videodev
sata_via libata scsi_mod ehci_hcd uhci_hcd usbcore video thermal processor fan
button ac
Pid: 17689, comm: gcc Not tainted 2.6.13
RIP: 0033:[phys_startup_64+3163032/2147483392] [phys_startup_64+3163032/2147483392]
RIP: 0033:[<0000000000404498>] [<0000000000404498>]
RSP: 002b:00007fffffb15710  EFLAGS: 00010202
RAX: 0000000000000000 RBX: 000000000051b6b8 RCX: 000000000051b5d0
RDX: 000000000051b5b0 RSI: 00002aaaaadee808 RDI: 0000000001000000
RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
R10: 00002aaaaadee620 R11: 0000000001000010 R12: 000000000051b500
R13: 00000000005171e0 R14: 7478657400746f68 R15: 0000000000000001
FS:  00002aaaaadf2b00(0000) GS:ffffffff80575880(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002aaaaac21c90 CR3: 000000004887e000 CR4: 00000000000006e0
 41 c7 06 00 00 00 00 eb c4 66 66 66 90 66 66 66 90 66 66 66
/proc/$$/maps:
00400000-00417000 r-xp 00000000 03:06 7375558 /usr/bin/gcc-3.4.3
00516000-00518000 rw-p 00016000 03:06 7375558 /usr/bin/gcc-3.4.3
00518000-00539000 rw-p 00518000 03:06 7375558 [heap]
2aaaaaaab000-2aaaaaabf000 r-xp 00000000 03:06 11911179 /lib64/ld-2.3.4.so
2aaaaaabf000-2aaaaaac0000 rw-p 2aaaaaabf000 03:06 11911179
2aaaaabbf000-2aaaaabc0000 r--p 00014000 03:06 11911179 /lib64/ld-2.3.4.so
2aaaaabc0000-2aaaaabc1000 rw-p 00015000 03:06 11911179 /lib64/ld-2.3.4.so
2aaaaabc1000-2aaaaace9000 r-xp 00000000 03:06 11911311 /lib64/tls/libc-2.3.4.so
2aaaaace9000-2aaaaade8000 ---p 00128000 03:06 11911311 /lib64/tls/libc-2.3.4.so
2aaaaade8000-2aaaaadeb000 r--p 00127000 03:06 11911311 /lib64/tls/libc-2.3.4.so
2aaaaadeb000-2aaaaadee000 rw-p 0012a000 03:06 11911311 /lib64/tls/libc-2.3.4.so
2aaaaadee000-2aaaaadf4000 rw-p 2aaaaadee000 03:06 11911311

Comment 91 Bongani Hlope 2005-09-07 10:38:27 UTC
Hi

Still testing comment #63. With this patch (add local tbl flush into do_no_page)

diff  -uNpr mm/memory.c.orig mm/memory.c
--- mm/memory.c.orig    2005-09-07 18:40:08.000000000 +0200
+++ mm/memory.c 2005-09-07 18:40:29.000000000 +0200
@@ -1790,6 +1790,8 @@ do_anonymous_page(struct mm_struct *mm,
        set_pte_at(mm, addr, page_table, entry);
        pte_unmap(page_table);

+       flush_tlb();
+
        /* No need to invalidate - it was non-present before */
        update_mmu_cache(vma, addr, entry);
        lazy_mmu_prot_update(entry)

I still get these: (I'll test adding wbinvd)

grep[10532]: segfault at 000000000000000e rip 000000000040da42 rsp
00007fffffd12be0 error 4
sed[12834] general protection rip:40870a rsp:7fffffd189a0 error:0

Modules linked in: rfcomm l2cap bluetooth snd_seq_dummy snd_seq_oss
snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_emu10k1 snd_rawmidi
snd_seq_device snd_ac97_codec snd_pcm snd_timer snd_page_alloc snd_util_mem
snd_hwdep snd soundcore i2c_isa i2c_viapro usbhid eth1394 tg3 ide_cd cdrom
ohci1394 ieee1394 loop nls_iso8859_1 nls_cp437 vfat fat tuner bttv video_buf
firmware_class i2c_algo_bit v4l2_common btcx_risc tveeprom i2c_core videodev
sata_via libata scsi_mod ehci_hcd uhci_hcd usbcore video thermal processor fan
button ac
Pid: 12834, comm: sed Not tainted 2.6.13
RIP: 0033:[<000000000040870a>] [<000000000040870a>]
RSP: 002b:00007fffffd189a0  EFLAGS: 00010213
RAX: 0000000000000000 RBX: 554e4728203a4343 RCX: 00000000002f2f58
RDX: 0000000000000001 RSI: 0000000000000031 RDI: 554e4728203a4343
RBP: 0000000000000031 R08: fefefefefefefeff R09: 6565656565656565
R10: 00002aaaaabc09e8 R11: 00002aaaaac30880 R12: 0000000000000005
R13: 00007fffffd18bc0 R14: 20322e3031207875 R15: 0000000000000000
FS:  00002aaaaadf2b00(0000) GS:ffffffff80575800(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000004086f0 CR3: 00000000164ae000 CR4: 00000000000006e0
 48 8b 47 08 48 39 07 74 23 48 8b 53 10 48 01 c2 48 ff c0 48
/proc/$$/maps:
00400000-0041b000 r-xp 00000000 03:06 5669023 /bin/sed
0051a000-0051b000 rw-p 0001a000 03:06 5669023 /bin/sed
0051b000-00544000 rw-p 0051b000 03:06 5669023 [heap]
2aaaaaaab000-2aaaaaabf000 r-xp 00000000 03:06 11911179 /lib64/ld-2.3.4.so
2aaaaaabf000-2aaaaaac0000 rw-p 2aaaaaabf000 03:06 11911179
2aaaaaac0000-2aaaaaac1000 r--p 00000000 03:06 7389976
/usr/share/locale/en_US/LC_TELEPHONE
2aaaaaac1000-2aaaaaac2000 r--p 00000000 03:06 7389937
/usr/share/locale/en_US/LC_ADDRESS
2aaaaaac2000-2aaaaaac3000 r--p 00000000 03:06 7389986
/usr/share/locale/en_US/LC_PAPER
2aaaaaac3000-2aaaaaac4000 r--p 00000000 03:06 7390161
/usr/share/locale/en_US/LC_MONETARY
2aaaaaac4000-2aaaaaac5000 r--p 00000000 03:06 7389930
/usr/share/locale/en_US/LC_NUMERIC
2aaaaabbf000-2aaaaabc0000 r--p 00014000 03:06 11911179 /lib64/ld-2.3.4.so
2aaaaabc0000-2aaaaabc1000 rw-p 00015000 03:06 11911179 /lib64/ld-2.3.4.so
2aaaaabc1000-2aaaaace9000 r-xp 00000000 03:06 11911311 /lib64/tls/libc-2.3.4.so
2aaaaace9000-2aaaaade8000 ---p 00128000 03:06 11911311 /lib64/tls/libc-2.3.4.so
2aaaaade8000-2aaaaadeb000 r--p 00127000 03:06 11911311 /lib64/tls/libc-2.3.4.so
2aaaaadeb000-2aaaaadee000 rw-p 0012a000 03:06 11911311 /lib64/tls/libc-2.3.4.so
2aaaaadee000-2aaaaadf4000 rw-p 2aaaaadee000 03:06 11911311

sed[27540] general protection rip:40870a rsp:7fffffb188d0 error:0

Modules linked in: rfcomm l2cap bluetooth snd_seq_dummy snd_seq_oss
snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_emu10k1 snd_rawmidi
snd_seq_device snd_ac97_codec snd_pcm snd_timer snd_page_alloc snd_util_mem
snd_hwdep snd soundcore i2c_isa i2c_viapro usbhid eth1394 tg3 ide_cd cdrom
ohci1394 ieee1394 loop nls_iso8859_1 nls_cp437 vfat fat tuner bttv video_buf
firmware_class i2c_algo_bit v4l2_common btcx_risc tveeprom i2c_core videodev
sata_via libata scsi_mod ehci_hcd uhci_hcd usbcore video thermal processor fan
button ac
Pid: 27540, comm: sed Not tainted 2.6.13
RIP: 0033:[<000000000040870a>] [<000000000040870a>]
RSP: 002b:00007fffffb188d0  EFLAGS: 00010213
RAX: 0000000000000000 RBX: 554e4728203a4343 RCX: 7300652d002f2f58
RDX: 0000000000000001 RSI: 0000000000000031 RDI: 554e4728203a4343
RBP: 0000000000000031 R08: fefefefefefefeff R09: 6565656565656565
R10: 00002aaaaabc09e8 R11: 00002aaaaac30880 R12: 0000000000000005
R13: 00007fffffb18af0 R14: 20322e3031207875 R15: 0000000000000000
FS:  00002aaaaadf2b00(0000) GS:ffffffff80575800(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000005af24c CR3: 00000000449dc000 CR4: 00000000000006e0
 48 8b 47 08 48 39 07 74 23 48 8b 53 10 48 01 c2 48 ff c0 48
/proc/$$/maps:
00400000-0041b000 r-xp 00000000 03:06 5669023 /bin/sed
0051a000-0051b000 rw-p 0001a000 03:06 5669023 /bin/sed
0051b000-00544000 rw-p 0051b000 03:06 5669023 [heap]
2aaaaaaab000-2aaaaaabf000 r-xp 00000000 03:06 11911179 /lib64/ld-2.3.4.so
2aaaaaabf000-2aaaaaac0000 rw-p 2aaaaaabf000 03:06 11911179
2aaaaaac0000-2aaaaaac1000 r--p 00000000 03:06 7389976
/usr/share/locale/en_US/LC_TELEPHONE
2aaaaaac1000-2aaaaaac2000 r--p 00000000 03:06 7389937
/usr/share/locale/en_US/LC_ADDRESS
2aaaaaac2000-2aaaaaac3000 r--p 00000000 03:06 7389986
/usr/share/locale/en_US/LC_PAPER
2aaaaaac3000-2aaaaaac4000 r--p 00000000 03:06 7390161
/usr/share/locale/en_US/LC_MONETARY
2aaaaaac4000-2aaaaaac5000 r--p 00000000 03:06 7389930
/usr/share/locale/en_US/LC_NUMERIC
2aaaaabbf000-2aaaaabc0000 r--p 00014000 03:06 11911179 /lib64/ld-2.3.4.so
2aaaaabc0000-2aaaaabc1000 rw-p 00015000 03:06 11911179 /lib64/ld-2.3.4.so
2aaaaabc1000-2aaaaace9000 r-xp 00000000 03:06 11911311 /lib64/tls/libc-2.3.4.so
2aaaaace9000-2aaaaade8000 ---p 00128000 03:06 11911311 /lib64/tls/libc-2.3.4.so
2aaaaade8000-2aaaaadeb000 r--p 00127000 03:06 11911311 /lib64/tls/libc-2.3.4.so
2aaaaadeb000-2aaaaadee000 rw-p 0012a000 03:06 11911311 /lib64/tls/libc-2.3.4.so
2aaaaadee000-2aaaaadf4000 rw-p 2aaaaadee000 03:06 11911311

sed[27933] general protection rip:40870a rsp:7fffffd18f50 error:0

Modules linked in: rfcomm l2cap bluetooth snd_seq_dummy snd_seq_oss
snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_emu10k1 snd_rawmidi
snd_seq_device snd_ac97_codec snd_pcm snd_timer snd_page_alloc snd_util_mem
snd_hwdep snd soundcore i2c_isa i2c_viapro usbhid eth1394 tg3 ide_cd cdrom
ohci1394 ieee1394 loop nls_iso8859_1 nls_cp437 vfat fat tuner bttv video_buf
firmware_class i2c_algo_bit v4l2_common btcx_risc tveeprom i2c_core videodev
sata_via libata scsi_mod ehci_hcd uhci_hcd usbcore video thermal processor fan
button ac
Pid: 27933, comm: sed Not tainted 2.6.13
RIP: 0033:[<000000000040870a>] [<000000000040870a>]
RSP: 002b:00007fffffd18f50  EFLAGS: 00010213
RAX: 0000000000000000 RBX: 554e4728203a4343 RCX: 7300652d002f2f58
RDX: 0000000000000001 RSI: 0000000000000031 RDI: 554e4728203a4343
RBP: 0000000000000031 R08: fefefefefefefeff R09: 6565656565656565
R10: 00002aaaaabc09e8 R11: 00002aaaaac30880 R12: 0000000000000005
R13: 00007fffffd19170 R14: 20322e3031207875 R15: 0000000000000000
FS:  00002aaaaadf2b00(0000) GS:ffffffff80575800(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000004086f0 CR3: 00000000603ab000 CR4: 00000000000006e0
 48 8b 47 08 48 39 07 74 23 48 8b 53 10 48 01 c2 48 ff c0 48
/proc/$$/maps:
00400000-0041b000 r-xp 00000000 03:06 5669023 /bin/sed
0051a000-0051b000 rw-p 0001a000 03:06 5669023 /bin/sed
0051b000-00544000 rw-p 0051b000 03:06 5669023 [heap]
2aaaaaaab000-2aaaaaabf000 r-xp 00000000 03:06 11911179 /lib64/ld-2.3.4.so
2aaaaaabf000-2aaaaaac0000 rw-p 2aaaaaabf000 03:06 11911179
2aaaaaac0000-2aaaaaac1000 r--p 00000000 03:06 7389976
/usr/share/locale/en_US/LC_TELEPHONE
2aaaaaac1000-2aaaaaac2000 r--p 00000000 03:06 7389937
/usr/share/locale/en_US/LC_ADDRESS
2aaaaaac2000-2aaaaaac3000 r--p 00000000 03:06 7389986
/usr/share/locale/en_US/LC_PAPER
2aaaaaac3000-2aaaaaac4000 r--p 00000000 03:06 7390161
/usr/share/locale/en_US/LC_MONETARY
2aaaaaac4000-2aaaaaac5000 r--p 00000000 03:06 7389930
/usr/share/locale/en_US/LC_NUMERIC
2aaaaabbf000-2aaaaabc0000 r--p 00014000 03:06 11911179 /lib64/ld-2.3.4.so
2aaaaabc0000-2aaaaabc1000 rw-p 00015000 03:06 11911179 /lib64/ld-2.3.4.so
2aaaaabc1000-2aaaaace9000 r-xp 00000000 03:06 11911311 /lib64/tls/libc-2.3.4.so
2aaaaace9000-2aaaaade8000 ---p 00128000 03:06 11911311 /lib64/tls/libc-2.3.4.so
2aaaaade8000-2aaaaadeb000 r--p 00127000 03:06 11911311 /lib64/tls/libc-2.3.4.so
2aaaaadeb000-2aaaaadee000 rw-p 0012a000 03:06 11911311 /lib64/tls/libc-2.3.4.so
2aaaaadee000-2aaaaadf4000 rw-p 2aaaaadee000 03:06 11911311

rm[32530] general protection rip:2aaaaac32260 rsp:7fffffd07f48 error:0

Modules linked in: rfcomm l2cap bluetooth snd_seq_dummy snd_seq_oss
snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_emu10k1 snd_rawmidi
snd_seq_device snd_ac97_codec snd_pcm snd_timer snd_page_alloc snd_util_mem
snd_hwdep snd soundcore i2c_isa i2c_viapro usbhid eth1394 tg3 ide_cd cdrom
ohci1394 ieee1394 loop nls_iso8859_1 nls_cp437 vfat fat tuner bttv video_buf
firmware_class i2c_algo_bit v4l2_common btcx_risc tveeprom i2c_core videodev
sata_via libata scsi_mod ehci_hcd uhci_hcd usbcore video thermal processor fan
button ac
Pid: 32530, comm: rm Not tainted 2.6.13
RIP: 0033:[<00002aaaaac32260>] [<00002aaaaac32260>]
RSP: 002b:00007fffffd07f48  EFLAGS: 00010287
RAX: 65722e006e79642e RBX: 0000000000000000 RCX: 0000000000000001
RDX: 0000000000000001 RSI: 00007fffffd0a51e RDI: 65722e006e79642e
RBP: 00007fffffd07fa0 R08: 00007fffffd07fbc R09: 00007fffffd07fb8
R10: 65722e006e79642e R11: 00002aaaaac32200 R12: 0000000000407280
R13: 00007fffffd081e0 R14: 0000000000000000 R15: 0000000000000000
FS:  00002aaaaadf2b00(0000) GS:ffffffff80575800(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002aaaaac31c10 CR3: 0000000054b92000 CR4: 00000000000006e0
 f3 a4 4c 89 d0 c3 90 90 90 90 90 90 90 90 90 90 48 89 d0 48
/proc/$$/maps:
00400000-0040a000 r-xp 00000000 03:06 5668904 /bin/rm
0050a000-0050b000 rw-p 0000a000 03:06 5668904 /bin/rm
0050b000-0052c000 rw-p 0050b000 03:06 5668904 [heap]
2aaaaaaab000-2aaaaaabf000 r-xp 00000000 03:06 11911179 /lib64/ld-2.3.4.so
2aaaaaabf000-2aaaaaac0000 rw-p 2aaaaaabf000 03:06 11911179
2aaaaabbf000-2aaaaabc0000 r--p 00014000 03:06 11911179 /lib64/ld-2.3.4.so
2aaaaabc0000-2aaaaabc1000 rw-p 00015000 03:06 11911179 /lib64/ld-2.3.4.so
2aaaaabc1000-2aaaaace9000 r-xp 00000000 03:06 11911311 /lib64/tls/libc-2.3.4.so
2aaaaace9000-2aaaaade8000 ---p 00128000 03:06 11911311 /lib64/tls/libc-2.3.4.so
2aaaaade8000-2aaaaadeb000 r--p 00127000 03:06 11911311 /lib64/tls/libc-2.3.4.so
2aaaaadeb000-2aaaaadee000 rw-p 0012a000 03:06 11911311 /lib64/tls/libc-2.3.4.so
2aaaaadee000-2aaaaadf4000 rw-p 2aaaaadee000 03:06 11911311

sed[6922] general protection rip:40870a rsp:7fffffd196d0 error:0

Modules linked in: rfcomm l2cap bluetooth snd_seq_dummy snd_seq_oss
snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_emu10k1 snd_rawmidi
snd_seq_device snd_ac97_codec snd_pcm snd_timer snd_page_alloc snd_util_mem
snd_hwdep snd soundcore i2c_isa i2c_viapro usbhid eth1394 tg3 ide_cd cdrom
ohci1394 ieee1394 loop nls_iso8859_1 nls_cp437 vfat fat tuner bttv video_buf
firmware_class i2c_algo_bit v4l2_common btcx_risc tveeprom i2c_core videodev
sata_via libata scsi_mod ehci_hcd uhci_hcd usbcore video thermal processor fan
button ac
Pid: 6922, comm: sed Not tainted 2.6.13
RIP: 0033:[<000000000040870a>] [<000000000040870a>]
RSP: 002b:00007fffffd196d0  EFLAGS: 00010213
RAX: 0000000000000000 RBX: 554e4728203a4343 RCX: 000000002d002f2f
RDX: 0000000000000001 RSI: 0000000000000031 RDI: 554e4728203a4343
RBP: 0000000000000031 R08: fefefefefefefeff R09: 6565656565656565
R10: 00002aaaaabc09e8 R11: 00002aaaaac30880 R12: 0000000000000005
R13: 00007fffffd198f0 R14: 20322e3031207875 R15: 0000000000000000
FS:  00002aaaaadf2b00(0000) GS:ffffffff80575800(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000004086f0 CR3: 000000007c113000 CR4: 00000000000006e0
 48 8b 47 08 48 39 07 74 23 48 8b 53 10 48 01 c2 48 ff c0 48
/proc/$$/maps:
00400000-0041b000 r-xp 00000000 03:06 5669023 /bin/sed
0051a000-0051b000 rw-p 0001a000 03:06 5669023 /bin/sed
0051b000-00544000 rw-p 0051b000 03:06 5669023 [heap]
2aaaaaaab000-2aaaaaabf000 r-xp 00000000 03:06 11911179 /lib64/ld-2.3.4.so
2aaaaaabf000-2aaaaaac0000 rw-p 2aaaaaabf000 03:06 11911179
2aaaaaac0000-2aaaaaac1000 r--p 00000000 03:06 7389976
/usr/share/locale/en_US/LC_TELEPHONE
2aaaaaac1000-2aaaaaac2000 r--p 00000000 03:06 7389937
/usr/share/locale/en_US/LC_ADDRESS
2aaaaaac2000-2aaaaaac3000 r--p 00000000 03:06 7389986
/usr/share/locale/en_US/LC_PAPER
2aaaaaac3000-2aaaaaac4000 r--p 00000000 03:06 7390161
/usr/share/locale/en_US/LC_MONETARY
2aaaaaac4000-2aaaaaac5000 r--p 00000000 03:06 7389930
/usr/share/locale/en_US/LC_NUMERIC
2aaaaabbf000-2aaaaabc0000 r--p 00014000 03:06 11911179 /lib64/ld-2.3.4.so
2aaaaabc0000-2aaaaabc1000 rw-p 00015000 03:06 11911179 /lib64/ld-2.3.4.so
2aaaaabc1000-2aaaaace9000 r-xp 00000000 03:06 11911311 /lib64/tls/libc-2.3.4.so
2aaaaace9000-2aaaaade8000 ---p 00128000 03:06 11911311 /lib64/tls/libc-2.3.4.so
2aaaaade8000-2aaaaadeb000 r--p 00127000 03:06 11911311 /lib64/tls/libc-2.3.4.so
2aaaaadeb000-2aaaaadee000 rw-p 0012a000 03:06 11911311 /lib64/tls/libc-2.3.4.so
2aaaaadee000-2aaaaadf4000 rw-p 2aaaaadee000 03:06 11911311
Comment 92 Bongani Hlope 2005-09-07 11:08:21 UTC
Hi

Still testing comment #63. With this patch (add an explicit wbinvd call in
do_no_page)

diff  -uNpr mm/memory.c.orig mm/memory.c
--- mm/memory.c.orig    2005-09-07 18:40:08.000000000 +0200
+++ mm/memory.c 2005-09-07 19:42:34.000000000 +0200
@@ -1790,6 +1790,8 @@ do_anonymous_page(struct mm_struct *mm,
        set_pte_at(mm, addr, page_table, entry);
        pte_unmap(page_table);

+       wbinvd();
+
        /* No need to invalidate - it was non-present before */
        update_mmu_cache(vma, addr, entry);
        lazy_mmu_prot_update(entry);

and I still get these, so that's it for comment #63...

[OT] The Machine Check Exceptions seem to occurr when I boot with a CD inside
the CDROM driver or when I have a USB mem stick plugged in.


sed[10681] general protection rip:40870a rsp:7fffffd18560 error:0

Modules linked in: rfcomm l2cap bluetooth snd_seq_dummy snd_seq_oss
snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_emu10k1 snd_rawmidi
snd_seq_device snd_ac97_codec snd_pcm snd_timer snd_page_alloc snd_util_mem
snd_hwdep snd soundcore i2c_isa i2c_viapro usbhid eth1394 tg3 ide_cd cdrom
ohci1394 ieee1394 loop nls_iso8859_1 nls_cp437 vfat fat tuner bttv video_buf
firmware_class i2c_algo_bit v4l2_common btcx_risc tveeprom i2c_core videodev
sata_via libata scsi_mod ehci_hcd uhci_hcd usbcore video thermal processor fan
button ac
Pid: 10681, comm: sed Not tainted 2.6.13
RIP: 0033:[<000000000040870a>] [<000000000040870a>]
RSP: 002b:00007fffffd18560  EFLAGS: 00010217
RAX: 0000000000000000 RBX: 554e4728203a4343 RCX: 2f706f63642f2e00
RDX: 0000000000000001 RSI: 0000000000000032 RDI: 554e4728203a4343
RBP: 0000000000000032 R08: fefefefefefefeff R09: 6565656565656565
R10: 00002aaaaabc09e8 R11: 00002aaaaac30880 R12: 0000000000000004
R13: 00007fffffd18780 R14: 20322e3031207875 R15: 0000000000000000
FS:  00002aaaaadf2b00(0000) GS:ffffffff80575800(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000004086f0 CR3: 00000000538eb000 CR4: 00000000000006e0
 48 8b 47 08 48 39 07 74 23 48 8b 53 10 48 01 c2 48 ff c0 48
/proc/$$/maps:
00400000-0041b000 r-xp 00000000 03:06 5669023 /bin/sed
0051a000-0051b000 rw-p 0001a000 03:06 5669023 /bin/sed
0051b000-00544000 rw-p 0051b000 03:06 5669023 [heap]
2aaaaaaab000-2aaaaaabf000 r-xp 00000000 03:06 11911179 /lib64/ld-2.3.4.so
2aaaaaabf000-2aaaaaac0000 rw-p 2aaaaaabf000 03:06 11911179
2aaaaaac0000-2aaaaaac1000 r--p 00000000 03:06 7389976
/usr/share/locale/en_US/LC_TELEPHONE
2aaaaaac1000-2aaaaaac2000 r--p 00000000 03:06 7389937
/usr/share/locale/en_US/LC_ADDRESS
2aaaaaac2000-2aaaaaac3000 r--p 00000000 03:06 7389986
/usr/share/locale/en_US/LC_PAPER
2aaaaaac3000-2aaaaaac4000 r--p 00000000 03:06 7390161
/usr/share/locale/en_US/LC_MONETARY
2aaaaaac4000-2aaaaaac5000 r--p 00000000 03:06 7389930
/usr/share/locale/en_US/LC_NUMERIC
2aaaaabbf000-2aaaaabc0000 r--p 00014000 03:06 11911179 /lib64/ld-2.3.4.so
2aaaaabc0000-2aaaaabc1000 rw-p 00015000 03:06 11911179 /lib64/ld-2.3.4.so
2aaaaabc1000-2aaaaace9000 r-xp 00000000 03:06 11911311 /lib64/tls/libc-2.3.4.so
2aaaaace9000-2aaaaade8000 ---p 00128000 03:06 11911311 /lib64/tls/libc-2.3.4.so
2aaaaade8000-2aaaaadeb000 r--p 00127000 03:06 11911311 /lib64/tls/libc-2.3.4.so
2aaaaadeb000-2aaaaadee000 rw-p 0012a000 03:06 11911311 /lib64/tls/libc-2.3.4.so
2aaaaadee000-2aaaaadf4000 rw-p 2aaaaadee000 03:06 11911311

gcc[14010] general protection rip:404498 rsp:7fffffb15b20 error:0

Modules linked in: rfcomm l2cap bluetooth snd_seq_dummy snd_seq_oss
snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_emu10k1 snd_rawmidi
snd_seq_device snd_ac97_codec snd_pcm snd_timer snd_page_alloc snd_util_mem
snd_hwdep snd soundcore i2c_isa i2c_viapro usbhid eth1394 tg3 ide_cd cdrom
ohci1394 ieee1394 loop nls_iso8859_1 nls_cp437 vfat fat tuner bttv video_buf
firmware_class i2c_algo_bit v4l2_common btcx_risc tveeprom i2c_core videodev
sata_via libata scsi_mod ehci_hcd uhci_hcd usbcore video thermal processor fan
button ac
Pid: 14010, comm: gcc Not tainted 2.6.13
RIP: 0033:[<0000000000404498>] [<0000000000404498>]
RSP: 002b:00007fffffb15b20  EFLAGS: 00010202
RAX: 0000000000000000 RBX: 000000000051b6b8 RCX: 000000000051b5d0
RDX: 000000000051b5b0 RSI: 00002aaaaadee808 RDI: 0000000001000000
RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
R10: 00002aaaaadee620 R11: 0000000001000010 R12: 000000000051b500
R13: 00000000005171e0 R14: 7478657400746f68 R15: 0000000000000001
FS:  00002aaaaadf2b00(0000) GS:ffffffff80575880(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002aaaaac21c90 CR3: 0000000055058000 CR4: 00000000000006e0
 41 c7 06 00 00 00 00 eb c4 66 66 66 90 66 66 66 90 66 66 66
/proc/$$/maps:
00400000-00417000 r-xp 00000000 03:06 7375558 /usr/bin/gcc-3.4.3
00516000-00518000 rw-p 00016000 03:06 7375558 /usr/bin/gcc-3.4.3
00518000-00539000 rw-p 00518000 03:06 7375558 [heap]
2aaaaaaab000-2aaaaaabf000 r-xp 00000000 03:06 11911179 /lib64/ld-2.3.4.so
2aaaaaabf000-2aaaaaac0000 rw-p 2aaaaaabf000 03:06 11911179
2aaaaabbf000-2aaaaabc0000 r--p 00014000 03:06 11911179 /lib64/ld-2.3.4.so
2aaaaabc0000-2aaaaabc1000 rw-p 00015000 03:06 11911179 /lib64/ld-2.3.4.so
2aaaaabc1000-2aaaaace9000 r-xp 00000000 03:06 11911311 /lib64/tls/libc-2.3.4.so
2aaaaace9000-2aaaaade8000 ---p 00128000 03:06 11911311 /lib64/tls/libc-2.3.4.so
2aaaaade8000-2aaaaadeb000 r--p 00127000 03:06 11911311 /lib64/tls/libc-2.3.4.so
2aaaaadeb000-2aaaaadee000 rw-p 0012a000 03:06 11911311 /lib64/tls/libc-2.3.4.so
2aaaaadee000-2aaaaadf4000 rw-p 2aaaaadee000 03:06 11911311

Comment 93 Daniel Drake 2005-09-17 10:18:42 UTC
Downstream bug report: http://bugs.gentoo.org/show_bug.cgi?id=104151 (although
no useful info to add at this point)
Comment 94 Linus Torvalds 2005-09-17 11:04:19 UTC
This may be stupid, but as far as I can tell, this bug only seems to 
happen on AMD CPU's. 

Maybe it's just because Intel's chips aren't that common, or used that
much in 64-bit setups, but that just strikes me as unlikely.

Maybe it's because it takes some very special timing, and AMD just happens 
to hit it.

But I get the feeling that it is more likely because of some architectural 
feature. The biggest suspect is the TLB. There's two things that AMD does 
differently:

 - the AMD TLB is much bigger, iirc, with the L2 picking up more entries

   IOW, we may have a TLB flushing bug that wouldn't show with a smaller 
   TLB.

 - the AMD tlb is reported to be "smarter", and a TLB flush doesn't 
   necessarily flush all entries - it supposedly tracks memory contents 
   for some entries with it's "tlb flush filter".

Now, we _know_ there are errata in the SMP tlb flush filter. AMD documents
them in their errata sheet (errata 63 and 122: "TLB Flush Filter causes
coherency problem in multiprocessor systems").

I dunno. Does this patch (totally untested, may not compile, somebody 
should check it) make any difference?

			Linus

diff --git a/arch/x86_64/kernel/setup.c b/arch/x86_64/kernel/setup.c
--- a/arch/x86_64/kernel/setup.c
+++ b/arch/x86_64/kernel/setup.c
@@ -831,11 +831,26 @@ static void __init amd_detect_cmp(struct
 #endif
 }
 
+#define HWCR 0xc0010015
+
 static int __init init_amd(struct cpuinfo_x86 *c)
 {
 	int r;
 	int level;
 
+#if CONFIG_SMP
+	unsigned long value;
+
+	// Disable TLB flush filter by setting HWCR.FFDIS:
+	// bit 6 of msr C001_0015
+	//
+	// Errata 63 for SH-B3 steppings
+	// Errata 122 for all(?) steppings
+	rdmsrl(HWCR, value);
+	value |= 1 << 6;
+	wrmsrl(HWCR, value);
+#endif
+
 	/* Bit 31 in normal CPUID used for nonstandard 3DNow ID;
 	   3DNow is IDd by bit 31 in extended CPUID (1*32+31) anyway */
 	clear_bit(0*32+31, &c->x86_capability);

Comment 95 Andi Kleen 2005-09-17 11:58:47 UTC
On Sat, 2005-09-17 at 11:04 -0700, Linus Torvalds wrote:
> This may be stupid, but as far as I can tell, this bug only seems to 
> happen on AMD CPU's. 
> 
> Maybe it's just because Intel's chips aren't that common, or used that
> much in 64-bit setups, but that just strikes me as unlikely.
> 
> Maybe it's because it takes some very special timing, and AMD just happens 
> to hit it.
> 
> But I get the feeling that it is more likely because of some architectural 
> feature. The biggest suspect is the TLB. There's two things that AMD does 
> differently:
> 
>  - the AMD TLB is much bigger, iirc, with the L2 picking up more entries
> 
>    IOW, we may have a TLB flushing bug that wouldn't show with a smaller 
>    TLB.
> 
>  - the AMD tlb is reported to be "smarter", and a TLB flush doesn't 
>    necessarily flush all entries - it supposedly tracks memory contents 
>    for some entries with it's "tlb flush filter".
> 
> Now, we _know_ there are errata in the SMP tlb flush filter. AMD documents
> them in their errata sheet (errata 63 and 122: "TLB Flush Filter causes
> coherency problem in multiprocessor systems").
> 
> I dunno. Does this patch (totally untested, may not compile, somebody 
> should check it) make any difference?

Everybody seems to want to blame all kinds of bugs on that one, I get
asked about it all the time :)

It might be worth a try, but merging that particular patch would be a
mistake because it doesn't limit the steppings where the flush filter
is disabled (so please don't merge it). If anything it should be limited
to max E6 stepping, which is known to have this problem.  On later
CPUs it might have unintended side effects. Also I would wait 
for feedback from people.

Davej had a similar patch in fedora iirc so he might know if the address
space randomization problem still happens there.  My feeling is that
that bug is too easy to hit on some setups so that it could be this
particular erratum.

-Andi


> 
> 			Linus
> 
> diff --git a/arch/x86_64/kernel/setup.c b/arch/x86_64/kernel/setup.c
> --- a/arch/x86_64/kernel/setup.c
> +++ b/arch/x86_64/kernel/setup.c
> @@ -831,11 +831,26 @@ static void __init amd_detect_cmp(struct
>  #endif
>  }
>  
> +#define HWCR 0xc0010015
> +
>  static int __init init_amd(struct cpuinfo_x86 *c)
>  {
>  	int r;
>  	int level;
>  
> +#if CONFIG_SMP
> +	unsigned long value;
> +
> +	// Disable TLB flush filter by setting HWCR.FFDIS:
> +	// bit 6 of msr C001_0015
> +	//
> +	// Errata 63 for SH-B3 steppings
> +	// Errata 122 for all(?) steppings
> +	rdmsrl(HWCR, value);
> +	value |= 1 << 6;
> +	wrmsrl(HWCR, value);
> +#endif
> +
>  	/* Bit 31 in normal CPUID used for nonstandard 3DNow ID;
>  	   3DNow is IDd by bit 31 in extended CPUID (1*32+31) anyway */
>  	clear_bit(0*32+31, &c->x86_capability);


Comment 96 Linus Torvalds 2005-09-17 12:07:42 UTC

On Sat, 17 Sep 2005, Linus Torvalds wrote:
> 
> Now, we _know_ there are errata in the SMP tlb flush filter. AMD documents
> them in their errata sheet (errata 63 and 122: "TLB Flush Filter causes
> coherency problem in multiprocessor systems").

Btw, this particular errata would also happen to explain why this bug 
only started happening with the 4-level pages.

With the old three-level page tables, mm context switches were done by 
just switching the highest level entry in one global page table. That will 
necessarily invalidate the whole flush filter, and thus basically disable 
it in practice over any context switch.

With the four-level page tables, a MM context switch actually switches the
whole page table, and the flush filter thus remains active over a context
switch. Thus making any flush filter bugs much less likely to trigger.

NOTE! This is still just a theory of mine, based on documentation and some
personal assumptions about how the flush filter works. But it does seem to 
make sense. I'd love to hear if the patch makes any difference to 
behaviour..

		Linus

Comment 97 Linus Torvalds 2005-09-17 12:35:59 UTC

On Sat, 17 Sep 2005, Andi Kleen wrote:
> 
> It might be worth a try, but merging that particular patch would be a
> mistake because it doesn't limit the steppings where the flush filter
> is disabled (so please don't merge it). If anything it should be limited
> to max E6 stepping, which is known to have this problem.  On later
> CPUs it might have unintended side effects. Also I would wait 
> for feedback from people.

If the flush filter is already disabled, the thing should have no impact,
so I don't see the point in limiting it to anything else. The errata
sheets just say "disable it by setting HWCR.FFDIS", there are no
limitations on it (like some of the other errata fixes that say "only do
this for steppings C0 and higher" or similar).

And yes, maybe it could be limited to the E6 stepping, but basically right 
now that means every single CPU out there in the wild.

So from a testing standpoint, the patch looks fine (assuming I didn't
introduce some silly typo or something). From a "let's commit it"
standpoint, I obviously want to hear if it makes any difference, but
considering the pain this has caused for us, if I get even a single report
that it fixes the problem, I _am_ going to commit that fix without any
further questions.

So if we confirm this to fix the problem, and if we then get confirmation
from AMD that future CPU's have that thing fixed, we can limit it _then_.

It still leaves the flush filter on for non-SMP configs. We could allow it 
for CONFIG_SMP when only one CPU is found, but I don't think many people 
do that (the distributions seem to always have separate UP/SMP kernels, so 
at most you might run an SMP kernel on a UP machine at install time).

		Linus

Comment 98 Andrew Walrond 2005-09-17 14:07:51 UTC
I have applied your patch (manually) to my earliest known bad kernel; 2.6.11.12 
with Arjans randomize_va_mappings patches. 
Built, booted and built gcc 3.3.6 with no segfaults/gpfs, which is encouraging 
since it rarely succeeded before. Currently building gcc 3.3.6 and gcc 4.0.x 
both at -j4. If that succeeds then I think you might have solved it. 
Will report back in a few minutes with progress report. 
 
[This is on a Tyan K8W with dual 250 opterons and 2Gb/cpu] 
Comment 99 Andrew Walrond 2005-09-17 14:33:53 UTC
All builds completed; no problems at all, so its now looking version likely 
indeed that you have 'put your finger on it', so to speak. 
 
I'll build a complete distro overnight to really hammer it, and I'll patch up 
the latest kernel version tomorrow and confirm that it is also OK 
 
$ uname -r -v -m 
2.6.11.12-debug #1 SMP Sat Sep 17 21:32:21 BST 2005 x86_64 
$ cat /proc/sys/kernel/randomize_va_space 
1 
 
Questions: 
 
How much will this patch cripple my dual opteron machines? 
Is this something that could be fixed up with microcode/bios upgrades? 
 
Andrew 
 
Comment 100 Linus Torvalds 2005-09-17 15:33:31 UTC

On Sat, 17 Sep 2005 bugme-daemon@kernel-bugs.osdl.org wrote:
>
> All builds completed; no problems at all, so its now looking version likely 
> indeed that you have 'put your finger on it', so to speak. 

Goodie.

> I'll build a complete distro overnight to really hammer it, and I'll patch up 
> the latest kernel version tomorrow and confirm that it is also OK 

Thanks.

> Questions: 
>  
> How much will this patch cripple my dual opteron machines? 
> Is this something that could be fixed up with microcode/bios upgrades? 

It's not likely to be a huge performance hit.

As mentioned, earlier kernels had effectively disabled this anyway for a
large number of TLB invalidates, since they wrote the the top-level page
directory on task switch, and task switching is the most likely thing to
actually get helped by this. But the cost of TLB re-loads due to TLB
invalidates is negligible under most loads - usually you'd not switch back 
and forth that much.

So yes, it will hit a few loads, and I bet you could benchmark it to see 
the effect, but I don't think you'll likely see it in any noticeable ways.

And no, a ucode/bios upgrade is unlikely to fix it. A BIOS upgrade might
have _hidden_ the bug (by having the BIOS do the same workaround that the
patch does), but the bug itself will almost certainly need a new CPU mask
revision to fix. I doubt any of this is really microcode: there's some
custom hardware to do TLB invalidate tracking on cache dirty and
invalidate time, and there's likely some case they just missed.

It's a bit sad, since it's a really clever feature and I like it, but 
it will get fixed eventually and then we can remove the workaround.

			Linus

Comment 101 Andrew Walrond 2005-09-17 15:45:50 UTC
A patched up 2.6.13.1 also seems fine. 
 
I've left a big distro build job running but dont expect any problems given 
results so far. 
 
I'd suggest an early stable branch release with this patch (or a modified 
version; Andi?). 
 
Great work - Thanks! 
 
Andrew 
 
Comment 102 Anonymous Emailer 2005-09-17 15:59:07 UTC
Reply-To: davej@redhat.com

On Sat, Sep 17, 2005 at 08:46:07PM +0200, Andi Kleen wrote:

 > Davej had a similar patch in fedora iirc so he might know if the address
 > space randomization problem still happens there.  My feeling is that
 > that bug is too easy to hit on some setups so that it could be this
 > particular erratum.

Initial test results look positive.  Before, people were seeing all
sorts of strange things ranging from bad pmd's, to bad swap entry msgs.
One of the Fedora users affected byt this problem came up with a userspace
program that poked /dev/cpu/*/msr, which did the trick for him.
As an experiment I merged a similar patch to our kernel, which also
seems to have fixed things for those affected by this issue before.


https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=155857
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=164941


		Dave

Comment 103 Brian Hall 2005-09-17 20:11:51 UTC
With that setup TLB patch, I observed no segfaults on my dual Opteron while it
was under heavy load for several hours, compiling with -j4 and playing games
under cedega, and randomize_va_space set to 1. Average load during this period
ranged between 4.5 and 5.0. Previously I would get crashes in my syslog with any
kernel above 2.6.11 unless I did "echo 0 > /proc/sys/kernel/randomize_va_space". 

My system:
kernel 2.6.13-ck5 + setup TLB patchlet
MSI K8T Master2 FAR, 2x244 Opterons, 1GB registered DDR400
Gentoo 2005.1 with glibc amd64 speedup patches (glibc_overlay)
Comment 104 Andi Kleen 2005-09-18 00:12:23 UTC
Actually BIOS updates are supposed to fix it - according to AMD
the BIOS are supposed to disable the flush filter on affected
revisions. Unfortunately quite a lot of Taiwanese vendors
are not very good in adding errata workarounds...

The problem is that I don't want to disable it for all revisions because
it could cause problems on future CPUs. I guess it would be ok
if it's limited in the steppings (only upto E7) 
Comment 105 PaX Team 2005-09-18 09:11:17 UTC
just to round out the story:
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/33340.pdf
Comment 106 Bongani Hlope 2005-09-18 12:05:21 UTC
Hi Linus,

Your patch fixes the problem for me too. I've been testing it for whole day

Thanx
Comment 107 Chris Caputo 2005-09-20 15:11:18 UTC
Linus, I also think your HWCR.FFDIS catch is on target.

After doing a BIOS update on my Tyan K8SE motherboard I found that this bug
stopped happening.  Rather than use your patch to set HWCR.FFDIS in init_amd() I
added some code to print out the current setting and sure enough, it is already
set - presumably by the new BIOS.

I am unable to confirm that the old BIOS did not set it because I don't want to
revert, but it seems like a safe bet it didn't.
Comment 108 Jules Colding 2005-09-21 00:09:38 UTC
I am still getting segfaults in my log after applying the HWCR.FFDIS patch, but
I have not got any of the general protection faults:

[ 5907.301418] flipflop[11381]: segfault at 0000000000000000 rip
00002aaaaac2082e rsp 00007fffffce84d0 error 6
[ 7109.144800] lament[11443]: segfault at 0000000000000000 rip 00002aaaab5ac82e
rsp 00007fffffd23780 error 6

I think that the segfaults might be another issue, maybe just the screensaver
not behaving?
Comment 109 Bongani Hlope 2005-10-29 00:02:59 UTC
Now that 2.6.14 is finally here, I'm closing this bug report
Comment 110 Adrian Bunk 2006-04-22 10:27:30 UTC
*** Bug 4991 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.