Bug 4851
Summary: | x86-64 userspace random segfaults and protection errors | ||
---|---|---|---|
Product: | Memory Management | Reporter: | Bongani Hlope (bonganilinux) |
Component: | Other | Assignee: | Andrew Morton (akpm) |
Status: | CLOSED PATCH_ALREADY_AVAILABLE | ||
Severity: | high | CC: | aj, akpm, alexn, andi-bz, andrew, arjan, ccaputo, colding, drepper, eshabtai, hallbw, j5uv22602, jeremyhu, kernel, mingo, ornati, roland, stefaan.deroeck, torvalds, zwane |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.11-mm1 - 2.6.13 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: |
2.6.13-rc4 .config file
Patch to arch/x86_64/kernel/traps.c to make it so register/code dump happens. Gentoo forum thread on the same subject. Patch to x86-64 traps.c in 2.6.13-rc6 to show maps, regs and RIP bytes. Dump maps of failing program this is againt 2.6.13-rc5 Patch to x86-64 traps.c in 2.6.13-rc6 to show maps, regs and RIP bytes. |
Description
Bongani Hlope
2005-07-05 15:10:22 UTC
To diagnose this at all a /proc/<pid>/maps at the moment of crash is needed. Can you run one of these commands inside gdb (several times if needed) until one crashes? (just as paranoia check, please confirm you are not overclocking your system, are using the stock unmodified kernels and are not using binary or otherwise external kernel modules) No the system is not overclocked and these are the default 2.6.11-mm1->2.6.12 kernels. There are no binary modules used Where do we stand with this one? Still present in 2.6.13-rc4? Not sure if this is useful or not, but I have also seen the problem described above with 2.6.12.3 while using a single Opteron 275 (dual-core) with no over-clocking. Tyan S2892 K8SE motherboard with 8 gigs of RAM. With limited testing I can say that setting kernel.randomize_va_space to zero caused the problem to not happen anymore. Andrew, I confirm that this bug is still present in 2.6.13-rc4 and also that "sysctl kernel.randomize_va_space=0" prevents the bug from happening. My repro method is to compile gdb and nmap at the same time, using an Opteron 275. Results in: chmod[17460] general protection rip:404207 rsp:7fffff906750 error:0 chmod[19921] general protection rip:2aaaaaaac274 rsp:7fffff9bda70 error:0 cat[26314] general protection rip:2aaaaaaac274 rsp:7fffffdbced0 error:0 gcc[26444] general protection rip:2aaaaaaac274 rsp:7fffff9bd740 error:0 grep[26929] general protection rip:2aaaaaaac274 rsp:7ffffffbdd10 error:0 cat[1764] general protection rip:2aaaaaaac274 rsp:7fffffbbe900 error:0 Chris, could you put up your .config and say what distribution you use? Thanks Created attachment 5423 [details]
2.6.13-rc4 .config file
Distribution: Gentoo
.config: attached
taint: arcmsr.ko Areca RAID controller patches from 2.6.13-rc3-mm3
Created attachment 5424 [details]
Patch to arch/x86_64/kernel/traps.c to make it so register/code dump happens.
Here is additional info resulting from the application of this patch:
sed[5635] general protection rip:2aaaaaaac274 rsp:7ffffffbce50 error:0
Modules linked in:
Pid: 5635, comm: sed Not tainted 2.6.13-rc4
RIP: 0033:[<00002aaaaaaac274>] [<00002aaaaaaac274>]
RSP: 002b:00007ffffffbce50 EFLAGS: 00010296
RAX: 93d1007b364ae0c6 RBX: 0000000000000000 RCX: 00002aaaaaaabab0
RDX: 000000596fc3e124 RSI: 93d1007b364ae0c6 RDI: 93d1007b363992ae
RBP: 0000000000000000 R08: 00002aaaaabc03a8 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000200 R12: 0000000000000000
R13: 0000000000000000 R14: 00007ffffffbcec0 R15: 0000000000000000
FS: 00002aaaaaff36d0(0000) GS:ffffffff804d2880(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002aaaaabbfd90 CR3: 0000000234dec000 CR4: 00000000000006e0
48 8b 00 48 85 c0 74 74 48 89 c2 41 b9 ff ff ff 6f 41 ba 21
x86_64-pc-linux[7506] general protection rip:2aaaaaaac274 rsp:7ffffffbe850
error:0
Modules linked in:
Pid: 7506, comm: x86_64-pc-linux Not tainted 2.6.13-rc4
RIP: 0033:[<00002aaaaaaac274>] [<00002aaaaaaac274>]
RSP: 002b:00007ffffffbe850 EFLAGS: 00010282
RAX: cfcaf86d6672c8d9 RBX: 0000000000000000 RCX: 00002aaaaaaabab0
RDX: 0000005c2b559228 RSI: cfcaf86d6672c8d9 RDI: cfcaf86d66617ac1
RBP: 0000000000000000 R08: 00002aaaaabc03a8 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000200 R12: 0000000000000000
R13: 0000000000000000 R14: 00007ffffffbe8c0 R15: 0000000000000000
FS: 00002aaaaaff36d0(0000) GS:ffffffff804d2880(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002aaaaabbfd90 CR3: 0000000238d93000 CR4: 00000000000006e0
48 8b 00 48 85 c0 74 74 48 89 c2 41 b9 ff ff ff 6f 41 ba 21
rm[17664] general protection rip:2aaaaaaac274 rsp:7ffffffbdfd0 error:0
Modules linked in:
Pid: 17664, comm: rm Not tainted 2.6.13-rc4
RIP: 0033:[<00002aaaaaaac274>] [<00002aaaaaaac274>]
RSP: 002b:00007ffffffbdfd0 EFLAGS: 00010292
RAX: 8ed0b837358cefd6 RBX: 0000000000000000 RCX: 00002aaaaaaabab0
RDX: 00000063a8cfa4f8 RSI: 8ed0b837358cefd6 RDI: 8ed0b837357ba1be
RBP: 0000000000000000 R08: 00002aaaaabc03a8 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000200 R12: 0000000000000000
R13: 0000000000000000 R14: 00007ffffffbe040 R15: 0000000000000000
FS: 00002aaaaaff36d0(0000) GS:ffffffff804d2880(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002aaaaabbfd90 CR3: 0000000230cc8000 CR4: 00000000000006e0
48 8b 00 48 85 c0 74 74 48 89 c2 41 b9 ff ff ff 6f 41 ba 21
Hi Andrew I tested 2.6.13-rc4 with kernel.randomize_va_space=1 and the bug is still there. I've been trying to implement show_map when application segfault, sorry for being quite. I just saw the traps patch, I'll apply and test. I'm begining to suspect that the problem might be cause by the size of the command line that is passed to the applications that segfault. All the application that segfault usualy take long command line arguments, maye with the randomization this is not calculated correctly. RE: long command lines... I am not seeing that. I've seen a command as short as "rm -f conftest.er1" result in a "general protection" trap. This was revealed by enabling core dumps ("sysctl kernel.core_uses_pid=1", "ulimit -c unlimited") and using gdb to see the command line. Ingo forwarded some of the mails. Here's what I replied with. If somebody can reproduce it please provide this information. If this is really the location this is all very curious. But we can investigate it. What happens here is a call to elf_machine_load_address() computes the load *offset* for the ld.so binary. Yes, the function is misnamed but only now that we have prelinking. For a non-prelinked binary the returned value is in fact the load address. For a prelinked value the value is the difference of the prelink address and the actual load address. Now, the elf_machine_dynamic() function returns the offset of the dynamic section. This the simple offset in case the binary is not prelinked, or the adjusted address otherwise. Important to realize is that load offset + dynamic section offset are always pointing to the dynamic section. In the non-relinked case, the load offset is the "address" and the dynamic section offset is a real offset, in the prelinked case it is the other way around. That's at least the theory. If the crashes are where the reported say they are we should be able to debug it. So, please do the following after a crash (debian people need to adjust for their wrong paths): 1. determine whether the crash is really in _dl_start. Just run readelf -s /lib64/ld-linux-x86-64.so|egrep '_dl_start$' This should give output like 23: 000000354ee01390 1019 FUNC LOCAL DEFAULT 9 _dl_start This means the function _dl_start starts at 0x354ee01390 and is 1019 bytes long. 2. If the crash is really in _dl_start and in the loop handling the dynamic section we need to look at the code. Run objdump -Sr /lib64/ld-linux-x86-64.so|less and search for the definition of _dl_start. It should look something like this (this is a prelinked ld.so): 000000354ee01390 <_dl_start>: 354ee01390: 55 push %rbp 354ee01391: 48 89 fd mov %rdi,%rbp 354ee01394: 48 83 ec 10 sub $0x10,%rsp 354ee01398: 0f 31 rdtsc 354ee0139a: 89 d2 mov %edx,%edx 354ee0139c: 89 c0 mov %eax,%eax 354ee0139e: 48 c1 e2 20 shl $0x20,%rdx 354ee013a2: 48 09 c2 or %rax,%rdx 354ee013a5: 48 8b 05 f4 97 11 00 mov 1153012(%rip),%rax # 354ef1aba0 <_rtld_global+0xba0> 354ee013ac: 48 8d 35 dd ff ff ff lea -35(%rip),%rsi # 354ee01390 <_dl_start> 354ee013b3: 48 29 c6 sub %rax,%rsi 354ee013b6: 48 89 f7 mov %rsi,%rdi 354ee013b9: 48 03 3d 18 8c 11 00 add 1149976(%rip),%rdi # 354ef19fd8 <_GLOBAL_OFFSET_TABLE_+0x40> The important part starts at 354ee013ac. We need to look at the variables referenced. First there is the little variable in .data. So objdump -j .data -s /lib64/ld-linux-x86-64.so|less and look for address 354ef1aba0 (see the comment at the end of the instruction, this is the effective address): 354ef1aba0 9013e04e 35000000 ...N5... This means the value is 0x354ee01390. Now look at the _DYNAMIC address, which is computed using the GOT (instruction at 0x354ee013b9, variable at 0x354ef19fd8): objdump -j .got -s /lib64/ld-linux-x86-64.so|less For me this shows: 354ef19fd8 189ef14e 35000000 00adf14e 35000000 ...N5......N5... I.e., the address of _DYNAMIC is 0x354ef19e18. If the load address matches the prelink address, the value in %rsi after the sub at address 0x354ee013b3 is zero and zero is added to the _DYNAMIC address (which better is correct). If the load address does not match prelink address the computed dynamic section address is real _dl_start address - _dl_start after prelink + _DYNAMIC after prelink The "wrong" load address in the second and third term cancel each other out. This information should hopefully enable somebody to examine the crash in more detail. _dl_start is really almost the first code which is run. The _start function, which is the entry point, being with a30: 48 89 e7 mov %rsp,%rdi a33: e8 18 11 00 00 callq 1b50 <_dl_start> Another thing, has this problem been shown with Intel processors? Or other motherboards? I ask because we have other strange bugs related to dual opterons: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=160341 This bug is also some kind of memory corruption or address corruption of some sort since the kernel certainly creates the auxiliary vector even though the userland code doesn't see this. To me this smells almost like a hardware bug. I'm using a Tyan 2892 mobo with a dual-core Opteron 275. Bongani, what motherboard are you using with the dual-proc Opteron 244's? Ulrich, you might be right about it being hardware, but I'd be surprised since the problem doesn't show up when kernel.randomize_va_space is set to 0. My Motherboard is a MSI K8T Master2 FAR. But this board survived 8 hours of memtest+ before I reported the bug. [apologies for the length of this] I replaced my /lib64/ld-2.3.5.so which was lacking symbols (was stripped) with one with symbols. Rebooted and reproduced bug again, by compiling gdb and nmap at the same time. Here's a walk-through to summarize what I see... [dmesg snip of one of the faults:] rm[7790] general protection rip:2aaaaaaac274 rsp:7ffffffbdd20 error:0 Modules linked in: Pid: 7790, comm: rm Not tainted 2.6.13-rc4 RIP: 0033:[<00002aaaaaaac274>] [<00002aaaaaaac274>] RSP: 002b:00007ffffffbdd20 EFLAGS: 00010292 RAX: 989efc3b3b47a1d3 RBX: 0000000000000000 RCX: 00002aaaaaaabab0 RDX: 00000093d0db7607 RSI: 989efc3b3b47a1d3 RDI: 989efc3b3b3653bb RBP: 0000000000000000 R08: 00002aaaaabc03a8 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000200 R12: 0000000000000000 R13: 0000000000000000 R14: 00007ffffffbdd90 R15: 0000000000000000 FS: 00002aaaaaff36d0(0000) GS:ffffffff80538880(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00002aaaaabbfd90 CR3: 0000000237755000 CR4: 00000000000006e0 48 8b 00 48 85 c0 74 74 48 89 c2 41 b9 ff ff ff 6f 41 ba 21 00 00 70 bd 31 00 00 00 41 bb ff fd ff 6f 41 bc 34 fe ff 6f bb ff fe ff 6f 41 bd 40 ff ff 6f eb 17 66 66 66 90 66 66 90 --- # gdb rm core.7790 [...] Core was generated by `rm -f conftest.er1'. Program terminated with signal 11, Segmentation fault. #0 0x00002aaaaaaac274 in ?? () (gdb) x/10i $rip 0x2aaaaaaac274: Cannot access memory at address 0x2aaaaaaac274 (gdb) bt #0 0x00002aaaaaaac274 in ?? () #1 0x0000000000000000 in ?? () #2 0x0000000000000000 in ?? () [...] #2138 0x006d722f6e69622f in ?? () #2139 0x006d722f6e69622f in ?? () #2140 0x0000000000000000 in ?? () Cannot access memory at address 0x7ffffffc2000 --- Simple test program (x.c). Bytes match dmesg byte dump above. Just a sanity check to make sure it really is ld.so: char x[] = { 0x48,0x8b,0x00,0x48,0x85,0xc0,0x74,0x74,0x48,0x89,0xc2,0x41,0xb9,0xff, 0xff,0xff,0x6f,0x41,0xba,0x21,0x00,0x00,0x70,0xbd,0x31,0x00,0x00,0x00, 0x41,0xbb,0xff,0xfd,0xff,0x6f,0x41,0xbc,0x34,0xfe,0xff,0x6f,0xbb,0xff, 0xfe,0xff,0x6f,0x41,0xbd,0x40,0xff,0xff,0x6f,0xeb,0x17,0x66,0x66,0x66, 0x90,0x66,0x66,0x90 }; main() { printf("Hello\n"); } --- gcc -o x -g x.c gdb x (gdb) b *0x00002aaaaaaac274 Breakpoint 1 at 0x2aaaaaaac274 (gdb) b main Breakpoint 2 at 0x4004bc: file x.c, line 12. (gdb) run Starting program: /root/x Breakpoint 1, 0x00002aaaaaaac274 in ?? () (gdb) x/10i $rip 0x2aaaaaaac274: mov (%rax),%rax 0x2aaaaaaac277: test %rax,%rax 0x2aaaaaaac27a: je 0x2aaaaaaac2f0 0x2aaaaaaac27c: mov %rax,%rdx 0x2aaaaaaac27f: mov $0x6fffffff,%r9d 0x2aaaaaaac285: mov $0x70000021,%r10d 0x2aaaaaaac28b: mov $0x31,%ebp 0x2aaaaaaac290: mov $0x6ffffdff,%r11d 0x2aaaaaaac296: mov $0x6ffffe34,%r12d 0x2aaaaaaac29c: mov $0x6ffffeff,%ebx (gdb) x/10i x [char array from test program with assembly matching] 0x5008a0 <x>: mov (%rax),%rax 0x5008a3 <x+3>: test %rax,%rax 0x5008a6 <x+6>: je 0x50091c 0x5008a8 <x+8>: mov %rax,%rdx 0x5008ab <x+11>: mov $0x6fffffff,%r9d 0x5008b1 <x+17>: mov $0x70000021,%r10d 0x5008b7 <x+23>: mov $0x31,%ebp 0x5008bc <x+28>: mov $0x6ffffdff,%r11d 0x5008c2 <x+34>: mov $0x6ffffe34,%r12d 0x5008c8 <x+40>: mov $0x6ffffeff,%ebx While gdb is running "ps" in another shell reveals: root 18210 0.0 0.0 188 36 pts/1 T 19:14 0:00 /root/x # cat /proc/18210/maps 00400000-00401000 r-xp 00000000 08:02 536783 /root/x 00500000-00501000 rw-p 00000000 08:02 536783 /root/x 2aaaaaaab000-2aaaaaac0000 r-xp 00000000 08:02 537108 /lib64/ld-2.3.5.so 2aaaaabbf000-2aaaaabc1000 rw-p 00014000 08:02 537108 /lib64/ld-2.3.5.so 7ffffff40000-7ffffff56000 rw-p 7ffffff40000 00:00 0 [stack] ffffffffff600000-ffffffffffe00000 ---p 00000000 00:00 0 [vdso] Noted that RIP is in /lib64/ld-2.3.5.so code. Per Ulrich's suggestions: # readelf -s /lib64/ld-linux-x86-64.so.2|egrep '_dl_start$' 23: 0000000000001220 1507 FUNC LOCAL DEFAULT 9 _dl_start # objdump -Sr /lib64/ld-linux-x86-64.so.2|less [skipped down to _dl_start] 0000000000001220 <_dl_start>: 1220: 41 56 push %r14 1222: 49 89 fe mov %rdi,%r14 1225: 41 55 push %r13 1227: 41 54 push %r12 1229: 55 push %rbp 122a: 53 push %rbx 122b: 48 83 ec 40 sub $0x40,%rsp 122f: 0f 31 rdtsc 1231: 48 c1 e2 20 shl $0x20,%rdx 1235: 89 c0 mov %eax,%eax 1237: 4c 8d 05 6a 41 11 00 lea 1130858(%rip),%r8 # 1153a8 <_rtld_global+0x3a8> 123e: 48 09 c2 or %rax,%rdx 1241: 48 8b 05 d8 45 11 00 mov 1131992(%rip),%rax # 115820 <_rtld_global+0x820> 1248: 48 8d 3d d1 ff ff ff lea -47(%rip),%rdi # 1220 <_dl_start> 124f: 48 29 c7 sub %rax,%rdi 1252: 48 89 f8 mov %rdi,%rax 1255: 48 03 05 7c 3d 11 00 add 1129852(%rip),%rax # 114fd8 <_GLOBAL_OFFSET_TABLE_+0x40> 125c: 48 89 3d 05 41 11 00 mov %rdi,1130757(%rip) # 115368 <_rtld_global+0x368> 1263: 48 89 15 26 3b 11 00 mov %rdx,1129254(%rip) # 114d90 <start_time> 126a: 48 89 05 07 41 11 00 mov %rax,1130759(%rip) # 115378 <_rtld_global+0x378> 1271: 48 89 c6 mov %rax,%rsi 1274: 48 8b 00 mov (%rax),%rax 1277: 48 85 c0 test %rax,%rax 127a: 74 74 je 12f0 <_dl_start+0xd0> 127c: 48 89 c2 mov %rax,%rdx 127f: 41 b9 ff ff ff 6f mov $0x6fffffff,%r9d 1285: 41 ba 21 00 00 70 mov $0x70000021,%r10d 128b: bd 31 00 00 00 mov $0x31,%ebp 1290: 41 bb ff fd ff 6f mov $0x6ffffdff,%r11d 1296: 41 bc 34 fe ff 6f mov $0x6ffffe34,%r12d 129c: bb ff fe ff 6f mov $0x6ffffeff,%ebx 12a1: 41 bd 40 ff ff 6f mov $0x6fffff40,%r13d 12a7: eb 17 jmp 12c0 <_dl_start+0xa0> [...] # objdump -j .data -s /lib64/ld-linux-x86-64.so.2| grep 115820 115820 20120000 00000000 ....... # objdump -j .got -s /lib64/ld-linux-x86-64.so.2| grep 114fd8 114fd8 184e1100 00000000 00000000 00000000 .N.............. So _DYNAMIC equals 0x114e18. But since I don't appear to be prelinked I am not sure what we can determine from this. The values as shown in the executable are fine. You now need to look at the appropriate memory location in the core file. The addresses before relocation are 0x115820 and 0x114fd8. So yo have to add the load address of the dynamic linker to get the address at runtime. The value should be the same as in the executable since the relocation hasn't happen yet. You should see the wrong value, content of %rax, as the result. If not some magic made the value appear in the register. You have all the information available to trace the value which should be in the register. I checked the appropriate locations in the core file and did the math. This resulted in a sane value for RAX instead of the garbage value. Core was generated by `rm -f conftest.er1'. Program terminated with signal 11, Segmentation fault. #0 0x00002aaaaaaac274 in ?? () (gdb) x/xg 0x2aaaaabc0820 0x2aaaaabc0820: 0x0000000000001220 (gdb) x/xg 0x2aaaaabbffd8 0x2aaaaabbffd8: 0x00000000000114e18 0x2aaaaaaac220 - 0x1220 + 0x114e18 = 0x2AAAAABBFE18 (sane), but... (gdb) info registers rax rax 0x9d96be7b808c9bc7 -7091271125601313849 Is it possible the page with this data wasn't fully instantiated when then code ran, but was fully instantiated by the time the core dump happened? Not sure how else to explain what is observed. I could see that kind of a race condition explaining why this is seen relatively infrequently. Also, a question. Why can't I see the ld.so code from the core dump? (gdb) x/10 $rip 0x2aaaaaaac274: Cannot access memory at address 0x2aaaaaaac274 On Sun, 31 Jul 2005 bugme-daemon@kernel-bugs.osdl.org wrote: > > Is it possible the page with this data wasn't fully instantiated when then code > ran, but was fully instantiated by the time the core dump happened? Not sure > how else to explain what is observed. TLB or page table initialization bugs? > Also, a question. Why can't I see the ld.so code from the core dump? > > (gdb) x/10 $rip > 0x2aaaaaaac274: Cannot access memory at address 0x2aaaaaaac274 Possibly because it's file-backed: static int maydump(struct vm_area_struct *vma) ... /* If it hasn't been written to, don't write it out */ if (!vma->anon_vma) return 0; the core-dump should have a pointer to the file, and gdb should be able to read it from there, no? Linus almost sounds like a CPU bug. Randomization's effect could be that instead of one given TLB layout, it pretty much does a complete search of all possible TLB layouts, in terms of the hashing of virtual addresses used by ld.so. (as far as the stack pointer goes) So if the bug only happens with a certain virtual memory layout, randomization will 'spread out' the likelyhood of hitting the bug. Is there any (strong) correlation between the CPU types used for this? Even if I include the RH bugzilla bug the reported problems are restricted to SMP machines with opterons. Maybe somebody who talks to AMD can get them to try to reproduce the problem. another thing: does the bug only happen with the SMP kernel? On Mon, 1 Aug 2005, mingo@elte.hu wrote: > Is there any (strong) correlation between the CPU types used for this? I am not sure what you mean. Right now we've got Bongani with a dual-Opteron 244 on an MSI K8T Master2 FAR and me with a single dual-core Opteron 275 on a Tyan 2892. > another thing: does the bug only happen with the SMP kernel? I just tried a non-SMP kernel and was unable to repro. Bongani can you confirm? could you check one more thing: could you bind your shell to one of the CPUs (e.g. the first CPU), via "taskset 01 -p $$"? This way all commands started from that shell will run on CPU#0. Can you reproduce the bug with such a 'serialized' setup? I.e. does the bug depend on true SMP parallelism? (taskset is in the schedutils package) please try to do all testing on CPU#0, initially. but if it's not reproducible bound to a single CPU, feel free to play with other possibilities - like compiling on CPU#0 from one shell, and doing the other compile on CPU#1. we already know the UP kernel does not show the bug, so what might help a bit is to figure out what type of parallelism is needed to trigger the bug. you might also want to figure out a simpler reproducer. Does it need a full kernel compile to get the segfaults - or is it enough to compile two C files in parallel on two CPUs to trigger the faults, etc. On Mon, 1 Aug 2005, mingo@elte.hu wrote: > could you check one more thing: could you bind your shell to one of the > CPUs (e.g. the first CPU), via "taskset 01 -p $$"? This way all commands > started from that shell will run on CPU#0. Can you reproduce the bug > with such a 'serialized' setup? I.e. does the bug depend on true SMP > parallelism? If both compiles are run on CPU #0 the bug still happens. If the nmap compile is done exclusively on CPU #0 while the gdb compile is done exclusively on CPU #1 the bug still happens. > you might also want to figure out a simpler reproducer. Does it need a > full kernel compile to get the segfaults - or is it enough to compile > two C files in parallel on two CPUs to trigger the faults, etc. I haven't run into the problem with kernel compiles. For me the repro happens during the "./configure" stage of the nmap/gdb builds using the "emerge" function on Gentoo. The programs which have crashed have been: cat, chmod, gcc, grep, mkdir, mv, rm, sed and x86_64-pc-linux. Interestingly, while trying to find an easier repro, the following silly program... #include <unistd.h> main() { int i = 1000, pid; while (i--) { pid = fork(); if (pid == 0) execl("/usr/bin/rm", "rm", "-f", "/tmp/x"); printf("."); usleep(10000); } } ... when run from two shells at the same time occasionally results in the Redhat bugzilla #160341 as evidenced by the following being spit out: Usage: ld.so [OPTION]... EXECUTABLE-FILE [ARGS-FOR-PROGRAM...] You have invoked `ld.so', the helper program for shared library executables. [...] There was no fault reported when that happened and this test program was the first time I have seen that. Strangely one shell had it happen over and over again while another shell has yet to demonstrate it. Both shells have processor affinity set to 0x03 (both CPUs). I'm working to identify a better repro... could you boot the SMP kernel but with maxcpus=1? This basically has the effect of binding all tasks to CPU#0. if the bug does not occur with maxcpus=1, then it is strange that even if you bind everything to CPU#0 in the 2-CPU case, the bug still happens. This means that some other task, which may run on CPU#1, has an impact. That could be anything from pdflush threads to kswapd threads. (dont try to taskset those system threads, some of them are already bound and need to run on the right CPU!) another thing: if you are testing this in X, could you do the testing on a text console? That would ensure that by binding the shell to CPU#0, all tasks will run on CPU#0 - while in the X case both X and gnome-terminal would be able to run on CPU#1. On Tue, 2 Aug 2005, mingo@elte.hu wrote: > could you boot the SMP kernel but with maxcpus=1? This basically has the > effect of binding all tasks to CPU#0. > > if the bug does not occur with maxcpus=1, then it is strange that even > if you bind everything to CPU#0 in the 2-CPU case, the bug still > happens. This means that some other task, which may run on CPU#1, has an > impact. That could be anything from pdflush threads to kswapd threads. > (dont try to taskset those system threads, some of them are already > bound and need to run on the right CPU!) The bug does not happen with "maxcpus=1" on an SMP kernel. > another thing: if you are testing this in X, could you do the testing on > a text console? That would ensure that by binding the shell to CPU#0, > all tasks will run on CPU#0 - while in the X case both X and > gnome-terminal would be able to run on CPU#1. X is not installed on this machine. All of my tests are being done via an SSH login. Would there be any difference between an SSH session and a text console login? By the way, I have simplified the repro a little. I can now get it to happen by running the following program in one shell, while doing just an nmap compile in another shell. No longer is a gdb compile needed.: main() { long i = 1; while (i++); } Also, further info I have learned. The nmap compile under Gentoo (done with the "emerge nmap" command) involves compiling in what Gentoo calls a "sandbox". This sandbox sets LD_PRELOAD to "libsandbox.so" which is a library which intercepts and logs for later reporting, certain syscalls in order to keep the compile from going outside the sandbox. I'm working to find a repro that doesn't involve the nmap compilation or the sandbox, but have not yet succeeded. I am not sure why the sandbox/LD_PRELOAD makes a difference, but it seems to. Of course, this doesn't explain how the other reporter is seeing the problem with a "make -j4" kernel compile, while I am unable to repro it doing that. It seems that there is some kind of critical section or data that isn't being protected properly, and which gets occasionally trashed when both CPUs are non-idle. Created attachment 5479 [details]
Gentoo forum thread on the same subject.
Has someone managed to reproduce this on an intel system? On Tue, Aug 02, 2005 at 10:14:20AM -0700, bugme-daemon@kernel-bugs.osdl.org wrote: > > Also, further info I have learned. The nmap compile under Gentoo (done > with the "emerge nmap" command) involves compiling in what Gentoo calls a > "sandbox". This sandbox sets LD_PRELOAD to "libsandbox.so" which is a > library which intercepts and logs for later reporting, certain syscalls in > order to keep the compile from going outside the sandbox. I'm working to > find a repro that doesn't involve the nmap compilation or the sandbox, but > have not yet succeeded. I am not sure why the sandbox/LD_PRELOAD makes a > difference, but it seems to. Of course, this doesn't explain how the > other reporter is seeing the problem with a "make -j4" kernel compile, > while I am unable to repro it doing that. > actually libsandbox.so is broken: --- libsandbox.c~ 2005-08-02 18:54:52.000000000 -0400 +++ libsandbox.c 2005-08-02 18:54:52.000000000 -0400 @@ -677,7 +677,7 @@ fopen64(const char *pathname, const char if FUNCTION_SANDBOX_SAFE_CHAR ("fopen64", canonic, mode) { check_dlsym(fopen64); - result = true_fopen(pathname, mode); + result = true_fopen64(pathname, mode); } return result; fixes this one for me (btw libsandbox.so code is, well, lets say a bit of a shock when you're used to looking at kernel code) This is what I have to report back. I run make -j4 for kernel and make -j4 for qt on different terminals to reproduce this bug. 1. On a SMP kernel I can still see this bug 2. Non-SMP kernel I can't reproduce 3. SMP kernel with maxcpus=1, I can't reproduce 4. SMP kernel running schedtools -a 0x1 -e make -j4 for the kernel build, while running schedtools -a 0x2 -e make -j4 for qt on another terminal I cant reproduce the bug. I added some code to print arg_start and arg_end for the failing applications and a failing gcc process (arg_end - arg_start) gave me 33. So its not long command line arguments that are causing this. More report back. After testing a non-SMP kernel I rebuild a SMP kernel. The strange thing was that on this kernel I could not reproduced the bug. Before I went to get a rope to shoot myself for sending you guys on a wild goose chase I compared the working config file against the previous non working. I rebuild the kernel using the breaking config file and I could produce the bug still. It seems like the bug is dependent on selecting NUMA settings. Here is the diff between the working and the breaking kernels. --- /usr/src/config 2005-08-02 06:42:55.000000000 +0200 +++ /usr/src/linux-2.6.8/.config 2005-08-02 19:52:29.000000000 +0200 @@ -1,7 +1,7 @@ # # Automatically generated make config: don't edit # Linux kernel version: 2.6.13-rc4 -# Sat Jul 30 09:36:44 2005 +# Tue Aug 2 19:52:29 2005 # CONFIG_X86_64=y CONFIG_64BIT=y @@ -90,19 +90,16 @@ # CONFIG_PREEMPT_VOLUNTARY is not set CONFIG_PREEMPT=y CONFIG_PREEMPT_BKL=y -CONFIG_K8_NUMA=y +# CONFIG_K8_NUMA is not set # CONFIG_NUMA_EMU is not set -CONFIG_ARCH_DISCONTIGMEM_ENABLE=y -CONFIG_NUMA=y -CONFIG_ARCH_DISCONTIGMEM_DEFAULT=y -CONFIG_ARCH_SPARSEMEM_ENABLE=y +# CONFIG_NUMA is not set +CONFIG_ARCH_FLATMEM_ENABLE=y CONFIG_SELECT_MEMORY_MODEL=y -# CONFIG_FLATMEM_MANUAL is not set +CONFIG_FLATMEM_MANUAL=y # CONFIG_DISCONTIGMEM_MANUAL is not set -CONFIG_SPARSEMEM_MANUAL=y -CONFIG_SPARSEMEM=y -CONFIG_NEED_MULTIPLE_NODES=y -CONFIG_HAVE_MEMORY_PRESENT=y +# CONFIG_SPARSEMEM_MANUAL is not set +CONFIG_FLATMEM=y +CONFIG_FLAT_NODE_MEM_MAP=y CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID=y CONFIG_HAVE_DEC_LOCK=y CONFIG_NR_CPUS=2 @@ -147,7 +144,6 @@ CONFIG_ACPI_FAN=m CONFIG_ACPI_PROCESSOR=m CONFIG_ACPI_THERMAL=m -CONFIG_ACPI_NUMA=y # CONFIG_ACPI_ASUS is not set # CONFIG_ACPI_IBM is not set # CONFIG_ACPI_TOSHIBA is not set Zwane> Has someone managed to reproduce this on an intel system? Not to my knowledge. One person on the Gentoo forum reported it on a P4, but it doesn't seem conclusively to be the same issue. I inquiried if they had a /proc/sys/kernel/randomize_va_space and they did not (ie. older kernel). They mentioned that their issue seemed to go away when they "turned off support for PAGEEXEC in the kernel (for PAX protection)". > actually libsandbox.so is broken: [...] > - result = true_fopen(pathname, mode); > + result = true_fopen64(pathname, mode); The version I am running, 1.2.11, has the true_fopen64. Hi Chris The configuration file, without NUMA support seems to work fine for me. I even pushed the kernel compile to make -j8, while running make -j4 for qt on another terminal My config does _not_ have NUMA enabled and yet the problem happens. Bongani, does your system still show 2 CPUs when you have NUMA disabled. Yes it still shows 2 CPUs. With NUMA enabled I quickly get: gcc[30326] general protection rip:404498 rsp:7fffffd157a0 error:0 gcc[30326] arg_start: 140737485307281 aeg_end: 140737485307314 Modules linked in: isofs zlib_inflate rfcomm l2cap bluetooth snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_emu10k1 snd_rawmidi snd_seq_device snd_ac97_codec snd_pcm snd_timer snd_page_alloc snd_util_mem snd_hwdep snd soundcore i2c_isa i2c_viapro usbhid eth1394 tg3 ide_cd cdrom ohci1394 ieee1394 loop nls_iso8859_1 nls_cp437 vfat fat tuner bttv video_buf firmware_class i2c_algo_bit v4l2_common btcx_risc tveeprom i2c_core videodev sata_via libata scsi_mod ehci_hcd uhci_hcd usbcore video thermal processor hotkey fan button ac Pid: 30326, comm: gcc Not tainted 2.6.13-rc4 RIP: 0033:[<0000000000404498>] [<0000000000404498>] RSP: 002b:00007fffffd157a0 EFLAGS: 00010202 RAX: 0000000000000000 RBX: 000000000051b5b8 RCX: 000000000051b650 RDX: 000000000051b630 RSI: 00002aaaaadee748 RDI: 0000000000001000 RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000 R10: 00002aaaaadee620 R11: 0000000000001010 R12: 000000000051b5e0 R13: 00000000005171e0 R14: 7478657400746f68 R15: 0000000000000001 FS: 00002aaaaadf2b00(0000) GS:ffffffff80572880(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00002aaaaac21c90 CR3: 0000000039752000 CR4: 00000000000006e0 41 c7 06 00 00 00 00 eb c4 66 66 66 90 66 66 66 90 66 66 66 grep[18028]: segfault at 000000000000000e rip 000000000040da42 rsp 00007fffffd11a40 error 4 rm[23322] general protection rip:2aaaaac32260 rsp:7fffffd09768 error:0 rm[23322] arg_start: 140737485252100 arg_end: 140737485252103 Modules linked in: isofs zlib_inflate rfcomm l2cap bluetooth snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_emu10k1 snd_rawmidi snd_seq_device snd_ac97_codec snd_pcm snd_timer snd_page_alloc snd_util_mem snd_hwdep snd soundcore i2c_isa i2c_viapro usbhid eth1394 tg3 ide_cd cdrom ohci1394 ieee1394 loop nls_iso8859_1 nls_cp437 vfat fat tuner bttv video_buf firmware_class i2c_algo_bit v4l2_common btcx_risc tveeprom i2c_core videodev sata_via libata scsi_mod ehci_hcd uhci_hcd usbcore video thermal processor hotkey fan button ac Pid: 23322, comm: rm Not tainted 2.6.13-rc4 RIP: 0033:[<00002aaaaac32260>] [<00002aaaaac32260>] RSP: 002b:00007fffffd09768 EFLAGS: 00010287 RAX: 65722e006e79642e RBX: 0000000000000000 RCX: 0000000000000001 RDX: 0000000000000001 RSI: 00007fffffd0a605 RDI: 65722e006e79642e RBP: 00007fffffd097c0 R08: 00007fffffd097dc R09: 00007fffffd097d8 R10: 65722e006e79642e R11: 00002aaaaac32200 R12: 0000000000407280 R13: 00007fffffd09a00 R14: 0000000000000000 R15: 0000000000000000 FS: 00002aaaaadf2b00(0000) GS:ffffffff80572800(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00002aaaaac31c10 CR3: 000000005f59b000 CR4: 00000000000006e0 f3 a4 4c 89 d0 c3 90 90 90 90 90 90 90 90 90 90 48 89 d0 48 genksyms[4217] general protection rip:400cd3 rsp:7fffffd08c40 error:0 genksyms[4217] arg_start: 140737485250339 arg_end: 140737485250340 Modules linked in: isofs zlib_inflate rfcomm l2cap bluetooth snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_emu10k1 snd_rawmidi snd_seq_device snd_ac97_codec snd_pcm snd_timer snd_page_alloc snd_util_mem snd_hwdep snd soundcore i2c_isa i2c_viapro usbhid eth1394 tg3 ide_cd cdrom ohci1394 ieee1394 loop nls_iso8859_1 nls_cp437 vfat fat tuner bttv video_buf firmware_class i2c_algo_bit v4l2_common btcx_risc tveeprom i2c_core videodev sata_via libata scsi_mod ehci_hcd uhci_hcd usbcore video thermal processor hotkey fan button ac Pid: 4217, comm: genksyms Not tainted 2.6.13-rc4 RIP: 0033:[<0000000000400cd3>] [<0000000000400cd3>] RSP: 002b:00007fffffd08c40 EFLAGS: 00010206 RAX: 0000000000000000 RBX: 6b6172646e614d28 RCX: 00000000005125e7 RDX: 0000000000000004 RSI: 0000000000000001 RDI: 00000000005125d9 RBP: 00000000005125d9 R08: 0000000000000004 R09: 0000000000000003 R10: 0000000000516fd0 R11: 0000000000000000 R12: 0000000000000001 R13: 0000000000000024 R14: 00007fffffd0932a R15: 00000000005125d9 FS: 00002aaaaadf2b00(0000) GS:ffffffff80572800(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 000000000050ad58 CR3: 0000000021110000 CR4: 00000000000006e0 8b 4b 10 83 f9 01 75 cc 30 c9 41 83 fc 01 44 89 e2 75 d5 30 I added this "debug" line: gcc[30326] arg_start: 140737485307281 aeg_end: 140737485307314 rm[23322] arg_start: 140737485252100 arg_end: 140737485252103 to see the size of the command line arguments Another repro method, from a person on the Gentoo forum: tar -xjvf gimp-2.3.2.tar.bz2 cd gimp-2.3.2/ ./configure make -j4 "It usually compiles fine for a while (2-5 minutes) before breaking. It seems random though because I can restart the compilation (no "make clean") and it will work for a little while and then crash in a different place. Sometimes a segfault or general prot won't cause make to error out. Here are the errors I got this time. I have another window open with root running sed[29234] general protection rip:40870a rsp:7fffffd1a020 error:0 sed[31892] general protection rip:40870a rsp:7fffffd189b0 error:0 I had firefox, top, folding@home and xmms running also" [...] NOTE: The RIP on these is different than the ones where the problem happens in _dl_start(). > Another repro method, from a person on the Gentoo forum:
Did that person also find that
echo 0 > /proc/sys/kernel/randomize_va_space
fixed it up?
Yes, the gimp compile repro was fixed with: echo 0 > /proc/sys/kernel/randomize_va_space I think this means we can say the Gentoo sandbox LD_PRELOAD aggravates the problem but that the problem happens regardless. On Tue, Aug 02, 2005 at 06:14:59PM -0700, bugme-daemon@kernel-bugs.osdl.org wrote: > I think this means we can say the Gentoo sandbox LD_PRELOAD aggravates the > problem but that the problem happens regardless. the sandbox code is REALLY bad and depends on a series of undefined glibc behaviors and orderings of shared libraries that get loaded, and as such it's hard to be sure it's the same bug we're chasing in this bug. In addition, are all these gentoo reports with a kernel.org kernel or with a kernel patches with PaX and whatnot? If it's the later then we need to discard those for now (simply because 4 level pagetables went in about the same time and PaX is more likely to interact with that) Can someone point to the latest libsafe code just in case? > the sandbox code is REALLY bad and depends on a series of undefined glibc > behaviors and orderings of shared libraries that get loaded, and as such > it's hard to be sure it's the same bug we're chasing in this bug. You should have seen it some time ago :/ I am slowly trying to clean it up, but not that high on my priority list. If you can however point out something definate, I will be more than happy to look at it. > Can someone point to the latest libsafe code just in case? libsafe or libsandbox ? On Wed, Aug 03, 2005 at 02:52:36AM -0700, bugme-daemon@kernel-bugs.osdl.org wrote: > You should have seen it some time ago :/ I am slowly trying to clean it up, > but > not that high on my priority list. If you can however point out something > definate, I will be more than happy to look at it. apparently I looked at a slightly older one so I'd like to look at the latest one first before continueing. (Well not "like to" per se, I just had lunch :) > > > Can someone point to the latest libsafe code just in case? > > libsafe or libsandbox ? eh libsandbox Latest svn sources is here: ftp://ftp.nosferatu.za.org/pub/sandbox-1.2.11.tar.bz2 As I said, I know there is still a bit of issues, and implementation details, with needed cleanups ... the original writer departed us with it in a bad shape. Among the main things that I will need to address in the near future is to actually have versioned symbols, and both old/new versions when building on glibc. The last few months however was more just spent trying to get all known bugs (corruption, etc) and and some cleanups done. On Wed, 3 Aug 2005, arjanv@redhat.com wrote: > In addition, are all these gentoo reports with a kernel.org kernel or > with a kernel patches with PaX and whatnot? If it's the later then we > need to discard those for now (simply because 4 level pagetables went in > about the same time and PaX is more likely to interact with that) The Intel person I think we can rule out was a PaX user. The AMD users have reported the problem with kernel.org's 2.6.12.3 and 2.6.13-rc4. > tar -xjvf gimp-2.3.2.tar.bz2 > cd gimp-2.3.2/ > ./configure > make -j4 > > "It usually compiles fine for a while (2-5 minutes) before breaking. It seems > random though because I can restart the compilation (no "make clean") and it > will work for a little while and then crash in a different place. Sometimes a > segfault or general prot won't cause make to error out. Here are the errors I > got this time. I have another window open with root running > Yep, this is pretty much how I get this error svn update make -f Makefile.cvs cd /home/bongani/development/cpp/kde/build/kdelibs /home/bongani/development/cpp/kde/src/kdelibs/configure make -j4; make install; They happen randomly, which is why I still don't have the output of /proc/#/maps > sed[29234] general protection rip:40870a rsp:7fffffd1a020 error:0 > sed[31892] general protection rip:40870a rsp:7fffffd189b0 error:0 > The RIP have changed from some kerne l(I'll check when did they change), the new ones are around those values as well. genksyms[4217] general protection rip:400cd3 rsp:7fffffd08c40 error:0 rm[23322] general protection rip:2aaaaac32260 rsp:7fffffd09768 error:0 grep[18028]: segfault at 000000000000000e rip 000000000040da42 rsp 00007fffffd11a40 error 4 gcc[30326] general protection rip:404498 rsp:7fffffd157a0 error:0 I confirm that this bug applies to 2.6.13-rc5 too. Bongani, you may want to update the "kernel version" for this bug, if you confirm the same. Ok I tested 2.6.13-rc5 with i) NUMA-discontinues memory, NUMA-sparse memory and a non-NUMA kernel. They all hit the bug, but the NUMA-discontinues kernel silently fails with this error: i) NUMA-discontinues make -f scripts/Makefile.build obj=arch/x86_64/kernel gcc -Wp,-MD,arch/x86_64/kernel/.process.o.d -nostdinc -isystem /usr/lib/gcc/x86_64-mandrake-linux-gnu/3.4.3/include -D__KERNEL__ -Iinclude -Wall -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -ffreestanding -O2 -fomit-frame-pointer -g -march=k8 -mno-red-zone -mcmodel=kernel -pipe -fno-reorder-blocks -Wno-sign-compare -funit-at-a-time -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -Wdeclaration-after-statement -DKBUILD_BASENAME=process -DKBUILD_MODNAME=process -c -o arch/x86_64/kernel/.tmp_process.o arch/x86_64/kernel/process.c :includes nested too deeply make[1]: *** [arch/x86_64/kernel/process.o] Error 1 make: *** [arch/x86_64/kernel] Error 2 The error goes away when I did echo 0 > /proc/sys/kernel/randomize_va_space ii) NUMA-sparsemem rm[23068] general protection rip:2aaaaac32260 rsp:7fffff908578 error:0 rm[13440] general protection rip:2aaaaac32260 rsp:7fffffd09818 error:0 gcc[10127] general protection rip:404498 rsp:7ffffff164e0 error:0 checking for inttypes.h... /home/bongani/development/cpp/kde/src/kdelibs/configure: line 7779: 13440 Segmentation fault rm -f conftest.er1 /bin/sh: line 1: 10127 Segmentation fault (core dumped) gcc -Wp,-MD,drivers/acpi/namespace/.nsxfeval.o.d -nostdinc -isystem /usr/lib/gcc/x86_64-mandrake-linux-gnu/3.4.3/include -D__KERNEL__ -Iinclude -Wall -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -ffreestanding -O2 -fomit-frame-pointer -g -march=k8 -mno-red-zone -mcmodel=kernel -pipe -fno-reorder-blocks -Wno-sign-compare -funit-at-a-time -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -Wdeclaration-after-statement -Os -DKBUILD_BASENAME=nsxfeval -DKBUILD_MODNAME=nsxfeval -c -o drivers/acpi/namespace/.tmp_nsxfeval.o drivers/acpi/namespace/nsxfeval.c Same errors as always non-NUMA gcc[25983] general protection rip:404498 rsp:7fffffb16900 error:0 /bin/sh: line 1: 25983 Segmentation fault (core dumped) gcc -Wp,-MD,drivers/ide/.ide-taskfile.o.d -nostdinc -isystem /usr/lib/gcc/x86_64-mandrake-linux-gnu/3.4.3/include -D__KERNEL__ -Iinclude -Wall -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -ffreestanding -O2 -fomit-frame-pointer -g -march=k8 -mno-red-zone -mcmodel=kernel -pipe -fno-reorder-blocks -Wno-sign-compare -funit-at-a-time -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -Wdeclaration-after-statement -Idrivers/ide -DKBUILD_BASENAME=ide_taskfile -DKBUILD_MODNAME=ide_core -c -o drivers/ide/.tmp_ide-taskfile.o drivers/ide/ide-taskfile.c make[2]: *** [drivers/ide/ide-taskfile.o] Error 139 So the non-NUMA kernel also fails, the previous success for this seems like a fluke. I cannot reproduce this with my dual opteron 246 system. Compiling kernel, gimp source, running prime95 while compiling gimp, it all works fine on 2.6.13-rc6. I've also been using 13-rc2, rc3, rc4, and rc5 and have not noticed the problem on any of them (though I didn't really stress previous rc's). opteron:/home/brandon/tmp/crap # cat /proc/sys/kernel/randomize_va_space 1 I have 4 gigs memory and am not using any swap. Must swap be enabled for this problem to occur? <a href="http://d3.vunct.net/yx619.txt">config</a> <a href="http://d3.vunct.net/oy620.txt">dmesg</a> <a href="http://d3.vunct.net/zy621.php">lsmod</a> <a href="http://d3.vunct.net/jz622.php">lspci</a> I seem to experience this problem too. I have not seen any segfaults since I did the "echo 0 > /proc/sys/kernel/randomize_va_space" trick. Full report with lots of info here: <http://kerneltrap.org/mailarchive/1/message/101754/thread> Can the people who use the patched gentoo kernel sources (the one with PaX) please not report things into this bug (or if you already have, please say so) because PaX adds another layer of randomisation and that may well conflict, and is only confusing this bugreport with effectively useless information as a result. I am not using the PaX enabled kernel by the "Hardened Gentoo" subproject. The kernel is the normal gentoo-sources. I guess need to somehow capture /proc/pid/maps of a segfaulting application. Unfortunately it's difficult because it has already segfaulted. What would work is to run it in gdb and when you get the segfault Ctrl-z and save /proc/pidofapplication/maps Then attach that to the bug report. Also include a known good maps of the same app. (this bugzilla really needs NEEDINFO) re: comment #52: Arjan, FYI, PaX wasn't released for .12 and .13 yet, therefore all reports here have nothing to do with it (not to mention that i won't add 'another layer', i'll simply replace it with a better one). If there's an adventurous soul out there who can reproduce the problem and wants to try it under PaX then feel free to email me in private. re: comment #54: why don't you log /proc/pid/maps on the GPF as well? Created attachment 5589 [details]
Patch to x86-64 traps.c in 2.6.13-rc6 to show maps, regs and RIP bytes.
I didn't have luck with gdb so I patched traps.c (patch now with this bug) to
show the maps at the time of the fault. Please review my patch for any errors
- in particular whether the for loop in show_maps() catches all of the memory
areas properly. I ask because I am surprised to not see stack areas in the
below results - then again maybe that is an indicator of the problem. Here are
some results with 2.6.13-rc6:
grep[10384] general protection rip:2aaaaaaac274 rsp:7fffff9bcda0 error:0
Modules linked in:
Pid: 10384, comm: grep Not tainted 2.6.13-rc6
RIP: 0033:[<00002aaaaaaac274>] [<00002aaaaaaac274>]
RSP: 002b:00007fffff9bcda0 EFLAGS: 00010286
RAX: 8c8afb8a4852a409 RBX: 0000000000000000 RCX: 00002aaaaaaabab0
RDX: 0000004c845a2eac RSI: 8c8afb8a4852a409 RDI: 8c8afb8a484155f1
RBP: 0000000000000000 R08: 00002aaaaabc03a8 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000200 R12: 0000000000000000
R13: 0000000000000000 R14: 00007fffff9bce10 R15: 0000000000000000
FS: 00002aaaaaff36d0(0000) GS:ffffffff804d2800(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002aaaaabbfd90 CR3: 0000000230eb7000 CR4: 00000000000006e0
48 8b 00 48 85 c0 74 74 48 89 c2 41 b9 ff ff ff 6f 41 ba 21
/proc/$$/maps:
00400000-00414000 r-xp 00000000 08:02 522299 /bin/grep
00514000-00515000 rw-p 00014000 08:02 522299 /bin/grep
00515000-00516000 rw-p 00515000 08:02 522299 [heap]
2aaaaaaab000-2aaaaaac0000 r-xp 00000000 08:02 537108 /lib64/ld-2.3.5.so
2aaaaabbf000-2aaaaabc1000 rw-p 00014000 08:02 537108 /lib64/ld-2.3.5.so
rm[10836] general protection rip:2aaaaaaac274 rsp:7ffffffbd8b0 error:0
Modules linked in:
Pid: 10836, comm: rm Not tainted 2.6.13-rc6
RIP: 0033:[<00002aaaaaaac274>] [<00002aaaaaaac274>]
RSP: 002b:00007ffffffbd8b0 EFLAGS: 00010292
RAX: 9d96be7b808c9bc7 RBX: 0000000000000000 RCX: 00002aaaaaaabab0
RDX: 0000004d854ff8e1 RSI: 9d96be7b808c9bc7 RDI: 9d96be7b807b4daf
RBP: 0000000000000000 R08: 00002aaaaabc03a8 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000200 R12: 0000000000000000
R13: 0000000000000000 R14: 00007ffffffbd920 R15: 0000000000000000
FS: 00002aaaaaff36d0(0000) GS:ffffffff804d2880(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002aaaaabbfd90 CR3: 000000023184e000 CR4: 00000000000006e0
48 8b 00 48 85 c0 74 74 48 89 c2 41 b9 ff ff ff 6f 41 ba 21
/proc/$$/maps:
00400000-00409000 r-xp 00000000 08:02 554409 /bin/rm
00508000-00509000 rw-p 00008000 08:02 554409 /bin/rm
2aaaaaaab000-2aaaaaac0000 r-xp 00000000 08:02 537108 /lib64/ld-2.3.5.so
2aaaaabbf000-2aaaaabc1000 rw-p 00014000 08:02 537108 /lib64/ld-2.3.5.so
sed[12011] general protection rip:2aaaaaaac274 rsp:7ffffffbe060 error:0
Modules linked in:
Pid: 12011, comm: sed Not tainted 2.6.13-rc6
RIP: 0033:[<00002aaaaaaac274>] [<00002aaaaaaac274>]
RSP: 002b:00007ffffffbe060 EFLAGS: 00010286
RAX: d0dfc8413e8ce609 RBX: 0000000000000000 RCX: 00002aaaaaaabab0
RDX: 0000004ef8c9161d RSI: d0dfc8413e8ce609 RDI: d0dfc8413e7b97f1
RBP: 0000000000000000 R08: 00002aaaaabc03a8 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000200 R12: 0000000000000000
R13: 0000000000000000 R14: 00007ffffffbe0d0 R15: 0000000000000000
FS: 00002aaaaaff36d0(0000) GS:ffffffff804d2880(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002aaaaabbfd90 CR3: 0000000234ec3000 CR4: 00000000000006e0
48 8b 00 48 85 c0 74 74 48 89 c2 41 b9 ff ff ff 6f 41 ba 21
/proc/$$/maps:
00400000-0041b000 r-xp 00000000 08:02 507682 /bin/sed
0051a000-0051b000 rw-p 0001a000 08:02 507682 /bin/sed
0051b000-00523000 rw-p 0051b000 08:02 507682 [heap]
2aaaaaaab000-2aaaaaac0000 r-xp 00000000 08:02 537108 /lib64/ld-2.3.5.so
2aaaaabbf000-2aaaaabc1000 rw-p 00014000 08:02 537108 /lib64/ld-2.3.5.so
chmod[12223] general protection rip:2aaaaaaac274 rsp:7fffff9bdfb0 error:0
Modules linked in:
Pid: 12223, comm: chmod Not tainted 2.6.13-rc6
RIP: 0033:[<00002aaaaaaac274>] [<00002aaaaaaac274>]
RSP: 002b:00007fffff9bdfb0 EFLAGS: 00010286
RAX: d08db7357b9badcf RBX: 0000000000000000 RCX: 00002aaaaaaabab0
RDX: 0000004fb06107a4 RSI: d08db7357b9badcf RDI: d08db7357b8a5fb7
RBP: 0000000000000000 R08: 00002aaaaabc03a8 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000200 R12: 0000000000000000
R13: 0000000000000000 R14: 00007fffff9be020 R15: 0000000000000000
FS: 00002aaaaaff36d0(0000) GS:ffffffff804d2800(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002aaaaabbfd90 CR3: 0000000230ff0000 CR4: 00000000000006e0
48 8b 00 48 85 c0 74 74 48 89 c2 41 b9 ff ff ff 6f 41 ba 21
/proc/$$/maps:
00400000-00409000 r-xp 00000000 08:02 554392 /bin/chmod
00508000-00509000 rw-p 00008000 08:02 554392 /bin/chmod
2aaaaaaab000-2aaaaaac0000 r-xp 00000000 08:02 537108 /lib64/ld-2.3.5.so
2aaaaabbf000-2aaaaabc1000 rw-p 00014000 08:02 537108 /lib64/ld-2.3.5.so
On Wed, Aug 10, 2005 at 12:34:34PM -0700, bugme-daemon@kernel-bugs.osdl.org wrote: > grep[10384] general protection rip:2aaaaaaac274 rsp:7fffff9bcda0 error:0 > /proc/$$/maps: > 00400000-00414000 r-xp 00000000 08:02 522299 /bin/grep > 00514000-00515000 rw-p 00014000 08:02 522299 /bin/grep > 00515000-00516000 rw-p 00515000 08:02 522299 [heap] > 2aaaaaaab000-2aaaaaac0000 r-xp 00000000 08:02 537108 /lib64/ld-2.3.5.so > 2aaaaabbf000-2aaaaabc1000 rw-p 00014000 08:02 537108 /lib64/ld-2.3.5.so ok now it gets interesting. For some reason, rip has run "off" the mapped area of ld.so it seems..... can you get the following info? eu-readelf -l /lib64/ld-2.3.5.so that is supposed to give an idea of where the .so file says it wants to be mapped.. Created attachment 5590 [details]
Dump maps of failing program this is againt 2.6.13-rc5
;) I've just started testing a patch I've been working on as well. My patch is
based on fs/seq_file.c and fs/proc/task_mmu.c. It also needs reviewing.
Chris, I'll take a look at your patch, maybe we have a same idea. These are the
dumps I got using my patch. Oh, I also applied it on traps.c (it seems like the
only logical place to put it). Now back to the dumps:
gcc[29251] general protection rip:404498 rsp:7fffff917250 error:0
00400000-00417000 r-xp 00000000 03:06 7375679
/usr/bin/gcc-3.4.3
00516000-00518000 rw-p 00016000 03:06 7375679
/usr/bin/gcc-3.4.3
00518000-00539000 rw-p 00518000 00:00 0 [heap]
2aaaaaaab000-2aaaaaabf000 r-xp 00000000 03:06 11911179
/lib64/ld-2.3.4.so
2aaaaaabf000-2aaaaaac0000 rw-p 2aaaaaabf000 00:00 0
2aaaaabbf000-2aaaaabc0000 r--p 00014000 03:06 11911179
/lib64/ld-2.3.4.so
2aaaaabc0000-2aaaaabc1000 rw-p 00015000 03:06 11911179
/lib64/ld-2.3.4.so
2aaaaabc1000-2aaaaace9000 r-xp 00000000 03:06 11911311
/lib64/tls/libc-2.3.4.so
2aaaaace9000-2aaaaade8000 ---p 00128000 03:06 11911311
/lib64/tls/libc-2.3.4.so
2aaaaade8000-2aaaaadeb000 r--p 00127000 03:06 11911311
/lib64/tls/libc-2.3.4.so
2aaaaadeb000-2aaaaadee000 rw-p 0012a000 03:06 11911311
/lib64/tls/libc-2.3.4.so
2aaaaadee000-2aaaaadf4000 rw-p 2aaa
gcc[26423] general protection rip:404498 rsp:7ffffff15d70 error:0
00400000-00417000 r-xp 00000000 03:06 7375679
/usr/bin/gcc-3.4.3
00516000-00518000 rw-p 00016000 03:06 7375679
/usr/bin/gcc-3.4.3
00518000-00539000 rw-p 00518000 00:00 0 [heap]
2aaaaaaab000-2aaaaaabf000 r-xp 00000000 03:06 11911179
/lib64/ld-2.3.4.so
2aaaaaabf000-2aaaaaac0000 rw-p 2aaaaaabf000 00:00 0
2aaaaabbf000-2aaaaabc0000 r--p 00014000 03:06 11911179
/lib64/ld-2.3.4.so
2aaaaabc0000-2aaaaabc1000 rw-p 00015000 03:06 11911179
/lib64/ld-2.3.4.so
2aaaaabc1000-2aaaaace9000 r-xp 00000000 03:06 11911311
/lib64/tls/libc-2.3.4.so
2aaaaace9000-2aaaaade8000 ---p 00128000 03:06 11911311
/lib64/tls/libc-2.3.4.so
2aaaaade8000-2aaaaadeb000 r--p 00127000 03:06 11911311
/lib64/tls/libc-2.3.4.so
2aaaaadeb000-2aaaaadee000 rw-p 0012a000 03:06 11911311
/lib64/tls/libc-2.3.4.so
2aaaaadee000-2aaaaadf4000 rw-p 2aaa
On Wed, 10 Aug 2005, arjanv@redhat.com wrote: >> grep[10384] general protection rip:2aaaaaaac274 rsp:7fffff9bcda0 error:0 >> /proc/$$/maps: >> 00400000-00414000 r-xp 00000000 08:02 522299 /bin/grep >> 00514000-00515000 rw-p 00014000 08:02 522299 /bin/grep >> 00515000-00516000 rw-p 00515000 08:02 522299 [heap] >> 2aaaaaaab000-2aaaaaac0000 r-xp 00000000 08:02 537108 /lib64/ld-2.3.5.so >> 2aaaaabbf000-2aaaaabc1000 rw-p 00014000 08:02 537108 /lib64/ld-2.3.5.so > > ok now it gets interesting. For some reason, rip has run "off" the mapped > area of ld.so it seems..... But 2aaaaaaac274 is between 2aaaaaaab000 and 2aaaaaac0000, no? > can you get the following info? > > eu-readelf -l /lib64/ld-2.3.5.so > > that is supposed to give an idea of where the .so file says it wants to be > mapped.. # eu-readelf -l /lib64/ld-2.3.5.so Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align LOAD 0x000000 0x0000000000000000 0x0000000000000000 0x014b58 0x014b58 R E 0x100000 LOAD 0x014c40 0x0000000000114c40 0x0000000000114c40 0x000be8 0x000d68 RW 0x100000 DYNAMIC 0x014e18 0x0000000000114e18 0x0000000000114e18 0x000180 0x000180 RW 0x8 GNU_EH_FRAME 0x01341c 0x000000000001341c 0x000000000001341c 0x0004b4 0x0004b4 R 0x4 GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW 0x8 GNU_RELRO 0x014c40 0x0000000000114c40 0x0000000000114c40 0x0003c0 0x0003c0 R 0x1 LOOS+84153728 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 0x8 Section to Segment mapping: Segment Sections... 00 [RO: .hash .dynsym .dynstr .gnu.version .gnu.version_d .rela.dyn .rela.plt .plt .text .rodata .eh_frame_hdr .eh_frame] 01 [RELRO: .data.rel.ro .dynamic .got] .data .bss 02 [RELRO: .dynamic] 03 [RO: .eh_frame_hdr] 04 05 [RELRO: .data.rel.ro .dynamic .got] 06 re: comment #56: your for loop checks for map->vm_next, it's not needed (this is why you don't see the stack vma which is normally the last one in the vma list and has a NULL vm_next field). also, can you add a do_coredump() call as well then make the core available or analyze it yourself, in particular you should follow Ulrich Drepper's instructions to verify that the initial memory accesses in ld.so read proper values, it seems that memory is trashed somehow, you should see that in the coredump (don't forget to comment out the !vma->anon_vma check in maydump() as Linus pointed it already out above). Created attachment 5597 [details]
Patch to x86-64 traps.c in 2.6.13-rc6 to show maps, regs and RIP bytes.
Fix bug in patch in which final memory area wasn't being shown.
On Wed, 10 Aug 2005, pageexec@freemail.hu wrote: > re: comment #56: your for loop checks for map->vm_next, it's not needed > (this is why you don't see the stack vma which is normally the last one > in the vma list and has a NULL vm_next field). Thanks for the fix. > also, can you add a do_coredump() call as well then make the core > available or analyze it yourself, in particular you should follow Ulrich > Drepper's instructions to verify that the initial memory accesses in > ld.so read proper values, it seems that memory is trashed somehow, you > should see that in the coredump (don't forget to comment out the > !vma->anon_vma check in maydump() as Linus pointed it already out > above). I created a core dump with the !vma->anon_vma check in maydump() removed. Accessing the data in the core dump and doing the math to match the disassembly produces valid data, whereas the register dump shows invalid data. This returns me to my speculation in comment #18 of "Is it possible the page with this data wasn't fully instantiated when then code ran, but was fully instantiated by the time the core dump happened?" By the way, here's a corrected map dump with the stack area included: rm[3706] general protection rip:2aaaaaaac274 rsp:7fffffbbdeb0 error:0 Modules linked in: Pid: 3706, comm: rm Not tainted 2.6.13-rc6 RIP: 0033:[<00002aaaaaaac274>] [<00002aaaaaaac274>] RSP: 002b:00007fffffbbdeb0 EFLAGS: 00010296 RAX: d5d0b6397b499cc3 RBX: 0000000000000000 RCX: 00002aaaaaaabab0 RDX: 00000051350aa74c RSI: d5d0b6397b499cc3 RDI: d5d0b6397b384eab RBP: 0000000000000000 R08: 00002aaaaabc03a8 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000200 R12: 0000000000000000 R13: 0000000000000000 R14: 00007fffffbbdf20 R15: 0000000000000000 FS: 00002aaaaaff36d0(0000) GS:ffffffff804d2880(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00002aaaaabbfd90 CR3: 0000000231880000 CR4: 00000000000006e0 48 8b 00 48 85 c0 74 74 48 89 c2 41 b9 ff ff ff 6f 41 ba 21 /proc/$$/maps: 00400000-00409000 r-xp 00000000 08:02 554409 /bin/rm 00508000-00509000 rw-p 00008000 08:02 554409 /bin/rm 2aaaaaaab000-2aaaaaac0000 r-xp 00000000 08:02 537108 /lib64/ld-2.3.5.so 2aaaaabbf000-2aaaaabc1000 rw-p 00014000 08:02 537108 /lib64/ld-2.3.5.so 7fffffbab000-7fffffbc1000 rw-p 7fffffbab000 08:02 537108 [stack] For the faults I see, in the disassembly the relevant lines are: mov 1131992(%rip),%rax # either 1131992(%rip) is garbarge (likely) lea -47(%rip),%rdi # or -47(%rip) is garbage (unlikely, since an lea) sub %rax,%rdi mov %rdi,%rax add 1129852(%rip),%rax # RAX is consistenly 0x114e18 more than RDI as # expected, so problem starts prior to this add. mov (%rax),%rax # BAM! My perception is that the "mov 1131992(%rip),%rax" is sometimes resulting in junk data landing in RAX as a result of the data section of ld-2.3.5.so not being fully mapped. Or something. re: comment #62: thanks for the data. my thoughts: since the data in the coredump is apparently correct, we have a temporary glitch only (like we didn't know it already :-). where can it come from? 1. PC-relative addressing is wrong 'sometimes' 2. the CPU D$ is somehow out of sync 3. the (physical) page doesn't yet contain all data from DMA 4. the TLB entries point to the wrong page 5. the page table entries point to the wrong page you can verify 5 by adding a printk into mm/memory.c:do_no_page() to show the arguments of set_pte_at() (probably you should add a strcmp(current->comm) check so that you don't flood your logs), do the same in the GPF handler (needs a page table walk) so we can compare. it'll probably not be the case since by the time of the coredump the PTEs are apparently fine as they point to expected data and the PTEs in question are not written in-between (in theory). once you know that 5 is not the case, you can test case 4 by adding an explicit local TLB flush into do_no_page() where it already has a comment to the contrary (which is fine itself, this extra TLB flush would just make sure that the CPU doesn't actually have any entry, if it turns out that it does, there's a problem elsewhere in page table freeing/TLB flushing). i don't know how to verify case 3, it 'should not happen' of course, but adding some logging somewhere in the block or whatever layer could prove it. for case 2, you could add an explicit wbinvd() call into do_no_page(), it should ensure that the caches are in sync with main memory. i have no idea how to test for 1, except for maybe writing a small program in assembly that would have some data in .data and check it through PC-relative addressing, then you'd run this program until it fails (that is, this should be a standalone/statically linked app, without any crt*/etc linked in, quickest hack is to just hexedit a copy of ld.so and have some app use the copy :-). on a sidenote, everyone seemed to have solved/mitigated the issue by turning off randomization, however in Chris Caputo's logs i don't see any sign of mmap randomization yet the problem still manifested, how's that possible? or is it somehow the stack randomization that interferes with file mappings? On Thu, Aug 11, 2005 at 04:12:16AM -0700, bugme-daemon@kernel-bugs.osdl.org wrote: > on a sidenote, everyone seemed to have solved/mitigated the issue by turning off > randomization, however in Chris Caputo's logs i don't see any sign of mmap > randomization yet the problem still manifested, how's that possible? or is it > somehow the stack randomization that interferes with file mappings? the assumed culprit has been narrowed down to the part where the stack vma is randomized by one of the first reporters; just that disabled but with the other randomisations (including stack pointer) on didn't cause the crashes for him. If we assume that that holds true for the other reporters as well, then, well, I'm quite surprised how that happens. I agree that if library randomisation was involved that would be a prime suspect, but on first sight it is not :-( Also in the dumps it seems the stack is quite a large part away from any other mapping, so it doesn't look like accidental overlap either. Yes, the maps show that the stack if far away from the crashes. Library randomisation doesn't seem to be the problem. The following crashes happen in different places. The first one is within ld.so and the rest are inside the actual failling application. ar[23024] general protection rip:2aaaaabe31d7 rsp:7fffff909da8 error:0 Modules linked in: isofs zlib_inflate rfcomm l2cap bluetooth snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_emu10k1 snd_rawmidi snd_seq_device snd_ac97_codec snd_pcm snd_timer snd_page_alloc snd_util_mem snd_hwdep snd soundcore i2c_isa i2c_viapro usbhid eth1394 tg3 ide_cd cdrom ohci1394 ieee1394 loop nls_iso8859_1 nls_cp437 vfat fat tuner bttv video_buf firmware_class i2c_algo_bit v4l2_common btcx_risc tveeprom i2c_core videodev sata_via libata scsi_mod ehci_hcd uhci_hcd usbcore video thermal processor hotkey fan button ac Pid: 23024, comm: ar Tainted: G M 2.6.13-rc5 RIP: 0033:[<00002aaaaabe31d7>] [<00002aaaaabe31d7>] RSP: 002b:00007fffff909da8 EFLAGS: 00010202 RAX: 00002aaaaad49820 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000034 RSI: 00007fffff907640 RDI: 702e746f672e0074 RBP: 000000000050aa40 R08: 00002aaaab0a9b00 R09: ff4b4b405e424aff R10: 00002aaaaabc09e8 R11: 00002aaaaabe31d0 R12: 00007fffff90a505 R13: 0000000000000000 R14: 0000000000000007 R15: 0000000000000000 FS: 00002aaaab0a9b00(0000) GS:ffffffff80574800(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00002aaaaac3501d CR3: 00000000665c6000 CR4: 00000000000006e0 48 39 47 20 74 06 b8 01 00 00 00 c3 48 83 7f 18 00 74 f3 66 /proc/$$/maps: 00400000-0040b000 r-xp 00000000 03:06 7375767 /usr/bin/ar 0050a000-0050b000 rw-p 0000a000 03:06 7375767 /usr/bin/ar 0050b000-0052c000 rw-p 0050b000 03:06 7375767 [heap] 2aaaaaaab000-2aaaaaabf000 r-xp 00000000 03:06 11911179 /lib64/ld-2.3.4.so 2aaaaaabf000-2aaaaaac0000 rw-p 2aaaaaabf000 03:06 11911179 2aaaaabbf000-2aaaaabc0000 r--p 00014000 03:06 11911179 /lib64/ld-2.3.4.so 2aaaaabc0000-2aaaaabc1000 rw-p 00015000 03:06 11911179 /lib64/ld-2.3.4.so 2aaaaabc1000-2aaaaac4a000 r-xp 00000000 03:06 14696454 /usr/lib64/libbfd-2.15.92.0.2.so 2aaaaac4a000-2aaaaad49000 ---p 00089000 03:06 14696454 /usr/lib64/libbfd-2.15.92.0.2.so 2aaaaad49000-2aaaaad54000 rw-p 00088000 03:06 14696454 /usr/lib64/libbfd-2.15.92.0.2.so 2aaaaad54000-2aaaaad59000 rw-p 2aaaaad54000 03:06 14696454 2aaaaad74000-2aaaaad75000 rw-p 2aaaaad74000 03:06 14696454 2aaaaad75000-2aaaaad77000 r-xp 00000000 03:06 11911201 /lib64/libdl-2.3.4.so 2aaaaad77000-2aaaaae76000 ---p 00002000 03:06 11911201 /lib64/libdl-2.3.4.so 2aaaaae76000-2aaaaae78000 rw-p 00001000 03:06 11911201 /lib64/libdl-2.3.4.so 2aaaaae78000-2aaaaafa0000 r-xp 00000000 03:06 11911311 /lib64/tls/libc-2.3.4.so 2aaaaafa0000-2aaaab09f000 ---p 00128000 03:06 11911311 /lib64/tls/libc-2.3.4.so 2aaaab09f000-2aaaab0a2000 r--p 00127000 03:06 11911311 /lib64/tls/libc-2.3.4.so 2aaaab0a2000-2aaaab0a5000 rw-p 0012a000 03:06 11911311 /lib64/tls/libc-2.3.4.so 2aaaab0a5000-2aaaab0ab000 rw-p 2aaaab0a5000 03:06 11911311 7fffff8f6000-7fffff90b000 rw-p 7fffff8f6000 03:06 11911311 [stack] sed[10100] general protection rip:40870a rsp:7fffffd188b0 error:0 Modules linked in: isofs zlib_inflate rfcomm l2cap bluetooth snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_emu10k1 snd_rawmidi snd_seq_device snd_ac97_codec snd_pcm snd_timer snd_page_alloc snd_util_mem snd_hwdep snd soundcore i2c_isa i2c_viapro usbhid eth1394 tg3 ide_cd cdrom ohci1394 ieee1394 loop nls_iso8859_1 nls_cp437 vfat fat tuner bttv video_buf firmware_class i2c_algo_bit v4l2_common btcx_risc tveeprom i2c_core videodev sata_via libata scsi_mod ehci_hcd uhci_hcd usbcore video thermal processor hotkey fan button ac Pid: 10100, comm: sed Tainted: G M 2.6.13-rc5 RIP: 0033:[<000000000040870a>] [<000000000040870a>] RSP: 002b:00007fffffd188b0 EFLAGS: 00010213 RAX: 0000000000000000 RBX: 0073726f74632e00 RCX: 5b2f257300652d00 RDX: 0000000000000001 RSI: 0000000000000031 RDI: 0073726f74632e00 RBP: 0000000000000031 R08: fefefefefefefeff R09: 6565656565656565 R10: 00002aaaaabc09e8 R11: 00002aaaaac30880 R12: 0000000000000005 R13: 00007fffffd18ad0 R14: 616c65722e00725f R15: 0000000000000000 FS: 00002aaaaadf2b00(0000) GS:ffffffff80574800(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000000004086f0 CR3: 0000000049fdc000 CR4: 00000000000006e0 48 8b 47 08 48 39 07 74 23 48 8b 53 10 48 01 c2 48 ff c0 48 /proc/$$/maps: 00400000-0041b000 r-xp 00000000 03:06 5668924 /bin/sed 0051a000-0051b000 rw-p 0001a000 03:06 5668924 /bin/sed 0051b000-00544000 rw-p 0051b000 03:06 5668924 [heap] 2aaaaaaab000-2aaaaaabf000 r-xp 00000000 03:06 11911179 /lib64/ld-2.3.4.so 2aaaaaabf000-2aaaaaac0000 rw-p 2aaaaaabf000 03:06 11911179 2aaaaaac0000-2aaaaaac1000 r--p 00000000 03:06 7389976 /usr/share/locale/en_US/LC_TELEPHONE 2aaaaaac1000-2aaaaaac2000 r--p 00000000 03:06 7389937 /usr/share/locale/en_US/LC_ADDRESS 2aaaaaac2000-2aaaaaac3000 r--p 00000000 03:06 7389986 /usr/share/locale/en_US/LC_PAPER 2aaaaaac3000-2aaaaaac4000 r--p 00000000 03:06 7390161 /usr/share/locale/en_US/LC_MONETARY 2aaaaaac4000-2aaaaaac5000 r--p 00000000 03:06 7389930 /usr/share/locale/en_US/LC_NUMERIC 2aaaaabbf000-2aaaaabc0000 r--p 00014000 03:06 11911179 /lib64/ld-2.3.4.so 2aaaaabc0000-2aaaaabc1000 rw-p 00015000 03:06 11911179 /lib64/ld-2.3.4.so 2aaaaabc1000-2aaaaace9000 r-xp 00000000 03:06 11911311 /lib64/tls/libc-2.3.4.so 2aaaaace9000-2aaaaade8000 ---p 00128000 03:06 11911311 /lib64/tls/libc-2.3.4.so 2aaaaade8000-2aaaaadeb000 r--p 00127000 03:06 11911311 /lib64/tls/libc-2.3.4.so 2aaaaadeb000-2aaaaadee000 rw-p 0012a000 03:06 11911311 /lib64/tls/libc-2.3.4.so 2aaaaadee000-2aaaaadf4000 rw-p 2aaaaadee000 03:06 11911311 7fffffd06000-7fffffd1c000 rw-p 7fffffd06000 03:06 11911311 [stack] gcc[32622] general protection rip:404498 rsp:7ffffff172a0 error:0 Modules linked in: isofs zlib_inflate rfcomm l2cap bluetooth snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_emu10k1 snd_rawmidi snd_seq_device snd_ac97_codec snd_pcm snd_timer snd_page_alloc snd_util_mem snd_hwdep snd soundcore i2c_isa i2c_viapro usbhid eth1394 tg3 ide_cd cdrom ohci1394 ieee1394 loop nls_iso8859_1 nls_cp437 vfat fat tuner bttv video_buf firmware_class i2c_algo_bit v4l2_common btcx_risc tveeprom i2c_core videodev sata_via libata scsi_mod ehci_hcd uhci_hcd usbcore video thermal processor hotkey fan button ac Pid: 32622, comm: gcc Tainted: G M 2.6.13-rc5 RIP: 0033:[<0000000000404498>] [<0000000000404498>] RSP: 002b:00007ffffff172a0 EFLAGS: 00010202 RAX: 0000000000000000 RBX: 000000000051b6b8 RCX: 000000000051b5d0 RDX: 000000000051b5b0 RSI: 00002aaaaadee808 RDI: 0000000001000000 RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000 R10: 00002aaaaadee620 R11: 0000000001000010 R12: 000000000051b500 R13: 00000000005171e0 R14: 7478657400746f68 R15: 0000000000000001 FS: 00002aaaaadf2b00(0000) GS:ffffffff80574880(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00002aaaaac21c90 CR3: 0000000042a51000 CR4: 00000000000006e0 41 c7 06 00 00 00 00 eb c4 66 66 66 90 66 66 66 90 66 66 66 /proc/$$/maps: 00400000-00417000 r-xp 00000000 03:06 7375679 /usr/bin/gcc-3.4.3 00516000-00518000 rw-p 00016000 03:06 7375679 /usr/bin/gcc-3.4.3 00518000-00539000 rw-p 00518000 03:06 7375679 [heap] 2aaaaaaab000-2aaaaaabf000 r-xp 00000000 03:06 11911179 /lib64/ld-2.3.4.so 2aaaaaabf000-2aaaaaac0000 rw-p 2aaaaaabf000 03:06 11911179 2aaaaabbf000-2aaaaabc0000 r--p 00014000 03:06 11911179 /lib64/ld-2.3.4.so 2aaaaabc0000-2aaaaabc1000 rw-p 00015000 03:06 11911179 /lib64/ld-2.3.4.so 2aaaaabc1000-2aaaaace9000 r-xp 00000000 03:06 11911311 /lib64/tls/libc-2.3.4.so 2aaaaace9000-2aaaaade8000 ---p 00128000 03:06 11911311 /lib64/tls/libc-2.3.4.so 2aaaaade8000-2aaaaadeb000 r--p 00127000 03:06 11911311 /lib64/tls/libc-2.3.4.so 2aaaaadeb000-2aaaaadee000 rw-p 0012a000 03:06 11911311 /lib64/tls/libc-2.3.4.so 2aaaaadee000-2aaaaadf4000 rw-p 2aaaaadee000 03:06 11911311 7ffffff03000-7ffffff19000 rw-p 7ffffff03000 03:06 11911311 [stack] My x86-64 assembly sucks, so here goes (so does my x86 ;) objdump -Sr /usr/bin/gcc the search for 404498 (that's where RIP was pointing) 404463: 75 33 jne 404498 <strcat@plt+0x2c60> 404465: 48 8b 03 mov (%rbx),%rax 404468: 48 89 42 08 mov %rax,0x8(%rdx) 40446c: 48 89 13 mov %rdx,(%rbx) 40446f: 48 8b 5c 24 08 mov 0x8(%rsp),%rbx 404474: 48 8b 6c 24 10 mov 0x10(%rsp),%rbp 404479: 4c 8b 64 24 18 mov 0x18(%rsp),%r12 40447e: 4c 8b 6c 24 20 mov 0x20(%rsp),%r13 404483: 4c 8b 74 24 28 mov 0x28(%rsp),%r14 404488: 4c 8b 7c 24 30 mov 0x30(%rsp),%r15 40448d: 48 83 c4 38 add $0x38,%rsp 404491: c3 retq 404492: 41 89 45 08 mov %eax,0x8(%r13) 404496: eb a6 jmp 40443e <strcat@plt+0x2c06> -> 404498: 41 c7 06 00 00 00 00 movl $0x0,(%r14) 40449f: eb c4 jmp 404465 <strcat@plt+0x2c2d> 4044a1: 66 data16 4044a2: 66 data16 4044a3: 66 data16 4044a4: 90 nop 4044a5: 66 data16 4044a6: 66 data16 4044a7: 66 data16 according to the maps dump I just posted (thanx for the patch Chris) r13 has this value: R14: 7478657400746f68 ----- The sed dump (I have three of them and all happen on the same RIP) 4086f0: 48 89 5c 24 f0 mov %rbx,0xfffffffffffffff0(%rsp) 4086f5: 48 89 6c 24 f8 mov %rbp,0xfffffffffffffff8(%rsp) 4086fa: 31 c0 xor %eax,%eax 4086fc: 48 83 ec 18 sub $0x18,%rsp 408700: 83 fe ff cmp $0xffffffffffffffff,%esi 408703: 48 89 fb mov %rdi,%rbx 408706: 89 f5 mov %esi,%ebp 408708: 74 1d je 408727 <getopt_long@plt+0x665f> -> 40870a: 48 8b 47 08 mov 0x8(%rdi),%rax 40870e: 48 39 07 cmp %rax,(%rdi) 408711: 74 23 je 408736 <getopt_long@plt+0x666e> The sed dumps all have the following values RDI: 0073726f74632e00 RAX: 0000000000000000 -- Oh and the ar dumps is inside /usr/lib64/libbfd-2.15.92.0.2.so and not ld.so re: comment #65 and #66: it's not clear to me that your problem is the same as Chris Caputo's. in your case the invalid memory accesses are at addresses that are clearly ascii strings, not 'random' garbage as is the case with Chris. that is, it looks like there may be some kind of memory management bug in ar/gcc/libbfd (buffer overflow, stale pointer access, etc). if you can isolate the full command line that was running when the bug triggered, you could run it through valgrind 3.0 repeatedly to see if it finds anything suspicious, especially for a case when the GPF is triggered as well. of course this may in fact point to the actual bug as well, like we know what the 'bad' page contained (ascii strings, presumably produced by the same or same kind of application as the strings are section names). Comment on attachment 5590 [details]
Dump maps of failing program this is againt 2.6.13-rc5
Use Chris' patch
Can someone please verify that this code correctly gets the environement and command line argument for a task. static void show_env(void) { struct task_struct *task = current; char *cmdline; char *env; int len; struct mm_struct *mm = get_task_mm(task); cmdline = kmalloc(PAGE_SIZE, GFP_KERNEL); env = kmalloc(PAGE_SIZE, GFP_KERNEL); if(!cmdline) goto out; if(!env) goto out; if(!mm) goto out; memset(cmdline, 0, PAGE_SIZE); memset(env, 0, PAGE_SIZE); len = mm->arg_end - mm->arg_start; if(len >= PAGE_SIZE) len = PAGE_SIZE -1; access_process_vm(task, mm->arg_start, cmdline, len, 0); len = mm->env_end - mm->env_start; if(len >= PAGE_SIZE) len = PAGE_SIZE - 1; access_process_vm(task, mm->env_start, env, len, 0); printk("%s\n%s\n", env, cmdline); out: kfree(cmdline); kfree(env); } If this is correct, then there is something wrong with how they are built inside the kernel. For a crashing sed, it prints: LESSKEY=/etc/.less sed For a crashing gcc nothing is printed. Its seem like something else is causing programs to segfault, I've been tryig to build gcc (all of gcc) with debugging symbols but.... Kernel 2.6.12 and 2.6.13-rc5 just hardlock while compiling. This with randomize_va_space set to zero. Kernel 2.6.11 fails and throws these errors libstdc++.so.6.[26058]: segfault at 0000000000000001 rip 0000000000000001 rsp 00007ffffffff228 error 14 libstdc++.so.6.[21210]: segfault at 0000000000000001 rip 0000000000000001 rsp 00007ffffffff228 error 14 atkbd.c: Keyboard on isa0060/serio0 reports too many keys pressed. cls_1_1byte.exe[24190]: segfault at 00007f000000000a rip 00002aaaaabc33a4 rsp 00007fffffffc6e0 error 4 cls_24byte.exe[24256]: segfault at 0000000000000000 rip 0000000000400acb rsp 00007fffffffc4a0 error 4 cls_64byte.exe[24419]: segfault at 0000000000000000 rip 0000000000400abe rsp 00007fffffffc220 error 4 nested_struct.e[25219]: segfault at 0000000000000000 rip 0000000000400ad2 rsp 00007fffffffc460 error 4 nxagent[2069] general protection rip:460d46 rsp:7ffffffff970 error:0 nxagent[17660] general protection rip:460d46 rsp:7ffffffff970 error:0 nxagent[4548] general protection rip:460d46 rsp:7ffffffff970 error:0 libstdc++.so.6.[19376]: segfault at 0000000000000001 rip 0000000000000001 rsp 00007ffffffff228 error 14 kernel 2.6.10 fails with these errors cls_1_1byte.exe[8848]: segfault at 000000000000000a rip 0000002a9566e3a4 rsp 0000007fbfffc6e0 error 4 cls_24byte.exe[8879]: segfault at 0000000000000000 rip 0000000000400acb rsp 0000007fbfffc4a0 error 4 cls_64byte.exe[9004]: segfault at 0000000000000000 rip 0000000000400abe rsp 0000007fbfffc220 error 4 nested_struct.e[9777]: segfault at 0000000000000000 rip 0000000000400ad2 rsp 0000007fbfffc460 error 4 libstdc++.so.6.[590]: segfault at 0000000000000001 rip 0000000000000001 rsp 0000007fbffff228 error 14 I'll try to upgrade my userpace to Mandrake 2006 beta and see if this goes away (this will take a while on my 56k connection), after that I'll start ripping component out of my PC to see if it's a hardware bug. This is from the GIMP repro on a stock kernel 2.6.13-rc6 + Bongani Hlope's traps patch. Aug 21 18:03:04 gondor sed[24899] general protection rip:40870a rsp:7fffff918680 error:0 Aug 21 18:03:04 gondor Aug 21 18:03:04 gondor Modules linked in: nvidia w83627hf i2c_sensor i2c_isa i2c_viapro i2c_core snd_seq_midi snd_emu10k1_synth snd_emux_synth snd_seq_virmidi snd_seq_midi_event snd_seq_midi_emul snd_seq snd_pcm_oss snd_mixer_oss snd_emu10k1 snd_rawmidi snd_seq_device snd_ac97_codec snd_pcm snd_timer snd_page_alloc snd_util_mem snd_hwdep snd soundcore ehci_hcd uhci_hcd usbcore evdev Aug 21 18:03:04 gondor Pid: 24899, comm: sed Tainted: P 2.6.13-rc6 Aug 21 18:03:04 gondor RIP: 0033:[<000000000040870a>] [<000000000040870a>] Aug 21 18:03:04 gondor RSP: 002b:00007fffff918680 EFLAGS: 00010213 Aug 21 18:03:04 gondor RAX: 0000000000000000 RBX: 342e342e332d7073 RCX: 7300652d002f2f58 Aug 21 18:03:04 gondor RDX: 0000000000000001 RSI: 0000000000000031 RDI: 342e342e332d7073 Aug 21 18:03:04 gondor RBP: 0000000000000031 R08: fefefefefefefeff R09: 6565656565656565 Aug 21 18:03:04 gondor R10: 00002aaaaabc09a8 R11: 00002aaaaac2f230 R12: 0000000000000005 Aug 21 18:03:04 gondor R13: 00007fffff9188a0 R14: 342e33206f6f746e R15: 0000000000000000 Aug 21 18:03:04 gondor FS: 00002aaaaade6ae0(0000) GS:ffffffff80380800(0000) knlGS:0000000056ed8bb0 Aug 21 18:03:04 gondor CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Aug 21 18:03:04 gondor CR2: 00000000004086f0 CR3: 0000000019875000 CR4: 00000000000006a0 Aug 21 18:03:04 gondor 48 8b 47 08 48 39 07 74 23 48 8b 53 10 48 01 c2 48 ff c0 48 Aug 21 18:03:04 gondor /proc/$$/maps: Aug 21 18:03:04 gondor 00400000-0041b000 r-xp 00000000 08:03 1179722 /bin/sed Aug 21 18:03:04 gondor 0051a000-0051b000 rw-p 0001a000 08:03 1179722 /bin/sed Aug 21 18:03:04 gondor 0051b000-00544000 rw-p 0051b000 08:03 1179722 [heap] Aug 21 18:03:04 gondor 2aaaaaaab000-2aaaaaac0000 r-xp 00000000 08:03 880828 /lib64/ld-2.3.5.so Aug 21 18:03:04 gondor 2aaaaaac0000-2aaaaaac1000 rw-p 2aaaaaac0000 08:03 880828 Aug 21 18:03:04 gondor 2aaaaabbf000-2aaaaabc0000 r--p 00014000 08:03 880828 /lib64/ld-2.3.5.so Aug 21 18:03:04 gondor 2aaaaabc0000-2aaaaabc1000 rw-p 00015000 08:03 880828 /lib64/ld-2.3.5.so Aug 21 18:03:04 gondor 2aaaaabc1000-2aaaaacdd000 r-xp 00000000 08:03 880850 /lib64/libc-2.3.5.so Aug 21 18:03:04 gondor 2aaaaacdd000-2aaaaaddc000 ---p 0011c000 08:03 880850 /lib64/libc-2.3.5.so Aug 21 18:03:04 gondor 2aaaaaddc000-2aaaaaddf000 r--p 0011b000 08:03 880850 /lib64/libc-2.3.5.so Aug 21 18:03:04 gondor 2aaaaaddf000-2aaaaade2000 rw-p 0011e000 08:03 880850 /lib64/libc-2.3.5.so Aug 21 18:03:04 gondor 2aaaaade2000-2aaaaade8000 rw-p 2aaaaade2000 08:03 880850 Aug 21 18:03:04 gondor 7fffff906000-7fffff91c000 rw-p 7fffff906000 08:03 880850 [stack] Aug 21 18:03:04 gondor Hope this helps. # cat /proc/sys/kernel/randomize_va_space 1 Replying to my own comment #51... I have to add that the segfault happens even though that I have put 0 (zero) into "/proc/sys/kernel/randomize_va_space", but most more rarely. I got this in dmesg today when doing "make install" for some package: ### snip ### [81394.522481] sed[30821] general protection rip:40870a rsp:7fffffb18070 error:0 [84745.026843] mkdir[18152]: segfault at 0000000000000000 rip 000000000040184d rsp 00007fffffffdd20 error 4 I can confirm that on my machine (SUN V20z -> Dual Opteron 248) the setting makes disapper the general protection messages. I run 2.6.13 (vanilla). I have just found this thread after a week of pulling my hair out after a simple kernel upgrade. My input (for what its worth): Tyan K8W, dual opteron 250, NUMA,2Gb/cpu All vanilla kernels <= 2.6.11.12 OK All vanilla kernels >= 2.6.12 (including 2.6.13) BROKEN My test: Building gcc 3.3.6 from local terminal. Typical GPFs: log-2005-09-01-22:54:13:Sep 1 20:38:00 [kernel] sed[7623] general protection rip:404b75 rsp:7fffffd1a910 error:0 log-2005-09-01-22:54:13:Sep 1 21:40:28 [kernel] sed[8603] general protection rip:404b75 rsp:7ffffff191d0 error:0 I assume disabling randomize_va_space will fix it although I have yet to try. On Fri, Sep 02, 2005 at 08:05:42AM -0700, bugme-daemon@kernel-bugs.osdl.org wrote: > ------- Additional Comments From andrew@walrond.org 2005-09-02 08:05 ------- > I have just found this thread after a week of pulling my hair out after a > simple kernel upgrade. My input (for what its worth): > Tyan K8W, dual opteron 250, NUMA,2Gb/cpu > All vanilla kernels <= 2.6.11.12 OK > All vanilla kernels >= 2.6.12 (including 2.6.13) BROKEN please mention your distro (if gentoo, are you using that preload lib?) Distro is Heretix (http://www.h-e-r-e-t-i-x.org). Very simple from source affair, nothing fancy at all; no preload lib, totally vanilla kernel(s), glibc-2-3-branch (20050826,nptl), binutils-2-16, gcc-3.4.4. Totally reliable in all respects with linux < 2.6.12. I can also now confirm that, as expected, 2.6.13 seems fine with randomize_va_space disabled. (gcc just built twice without any problems; I have previously been unable to build once despite extensive trys with kernels 2.6.12.* - 2.6.13). Given the week I just had isolating this problem, I'm happy to help squash it :) If there is anything more I can do, let me know. Scratch that. After more extensive build tests (of a new distro) I suddenly got loads of: Sep 2 18:40:05 [kernel] as[11060]: segfault at 0000000000000000 rip 00000000004001a0 rsp 00007fffffffe570 error 6 Sep 2 18:40:06 [kernel] as[11954]: segfault at 0000000000000000 rip 00000000004001a0 rsp 00007fffffffe570 error 6 Sep 2 18:40:15 [kernel] as[16698]: segfault at 0000000000000000 rip 00000000004001a0 rsp 00007fffffffd4f0 error 6 while building binutils. This is vanilla untainted 2.6.13 with $ cat /proc/sys/kernel/randomize_va_space 0 I guess I'm stuck with 2.6.11.12 until this is resolved :( Since I have definite "2.6.11.12 works and 2.6.12 doesn't" situation, and a reliable test, perhaps a binary search/reversion of patches from 2.6.12 -> 2.6.11.12 would be useful? I haven't gotten round to using git yet, but I guess it would make the process relatively painless? Alternatively, perhaps you guys have some likely culprits in mind that I could revert? Prime suspect is randomize_va_mappings And if it's that it's likely buggy user space too, not necessarily a kernel bug. But since it still occurs with randomize_va_mappings disabled, (echo 0 >) isn't this unlikely? I guess applying the randomize_va_mappings patch to 2.6.11.12 and testing that might be a quick route to some answers. Can someone with the relevent git setup extract and send me the relevent patch? On Fri, Sep 02, 2005 at 02:10:06PM -0700, bugme-daemon@kernel-bugs.osdl.org wrote: > But since it still occurs with randomize_va_mappings disabled, (echo 0 >) > isn't > this unlikely? correct > I guess applying the randomize_va_mappings patch to 2.6.11.12 and testing > that > might be a quick route to some answers. Can someone with the relevent git > setup > extract and send me the relevent patch? http://www.kernel.org/pub/linux/kernel/people/arjan/randomize/ has them broken out another suspect would be 4 level pagetables; that got merged about the same time. Sorry for the big post. I just did a quick trace and haven't analized any of this yet. sed[23074] general protection rip:40870a rsp:7fffffb18510 error:0 Env start: 7fffffb1a543, Args start : 7fffffb1a520 LC_PAPER=en_US /bin/sed Modules linked in: isofs zlib_inflate rfcomm l2cap bluetooth snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_emu10k1 snd_rawmidi snd_seq_device snd_ac97_codec snd_pcm snd_timer snd_page_alloc snd_util_mem snd_hwdep snd soundcore i2c_isa i2c_viapro usbhid eth1394 tg3 ide_cd cdrom ohci1394 ieee1394 loop nls_iso8859_1 nls_cp437 vfat fat tuner bttv video_buf firmware_class i2c_algo_bit v4l2_common btcx_risc tveeprom i2c_core videodev sata_via libata scsi_mod ehci_hcd uhci_hcd usbcore video thermal processor hotkey fan button ac Pid: 23074, comm: sed Tainted: G M 2.6.13-rc5 RIP: 0033:[<000000000040870a>] [<000000000040870a>] RSP: 002b:00007fffffb18510 EFLAGS: 00010213 RAX: 0000000000000000 RBX: 554e4728203a4343 RCX: 7300652d002f2f58 RDX: 0000000000000001 RSI: 0000000000000031 RDI: 554e4728203a4343 RBP: 0000000000000031 R08: fefefefefefefeff R09: 6565656565656565 R10: 00002aaaaabc09e8 R11: 00002aaaaac30880 R12: 0000000000000005 R13: 00007fffffb18730 R14: 20322e3031207875 R15: 0000000000000000 FS: 00002aaaaadf2b00(0000) GS:ffffffff80574800(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000000004086f0 CR3: 000000002aebe000 CR4: 00000000000006e0 48 8b 47 08 48 39 07 74 23 48 8b 53 10 48 01 c2 48 ff c0 48 /proc/$$/maps: 00400000-0041b000 r-xp 00000000 03:06 5669023 /bin/sed 0051a000-0051b000 rw-p 0001a000 03:06 5669023 /bin/sed 0051b000-00544000 rw-p 0051b000 03:06 5669023 [heap] 2aaaaaaab000-2aaaaaabf000 r-xp 00000000 03:06 11911179 /lib64/ld-2.3.4.so 2aaaaaabf000-2aaaaaac0000 rw-p 2aaaaaabf000 03:06 11911179 2aaaaaac0000-2aaaaaac1000 r--p 00000000 03:06 7389976 /usr/share/locale/en_US/LC_TELEPHONE 2aaaaaac1000-2aaaaaac2000 r--p 00000000 03:06 7389937 /usr/share/locale/en_US/LC_ADDRESS 2aaaaaac2000-2aaaaaac3000 r--p 00000000 03:06 7389986 /usr/share/locale/en_US/LC_PAPER 2aaaaaac3000-2aaaaaac4000 r--p 00000000 03:06 7390161 /usr/share/locale/en_US/LC_MONETARY 2aaaaaac4000-2aaaaaac5000 r--p 00000000 03:06 7389930 /usr/share/locale/en_US/LC_NUMERIC 2aaaaabbf000-2aaaaabc0000 r--p 00014000 03:06 11911179 /lib64/ld-2.3.4.so 2aaaaabc0000-2aaaaabc1000 rw-p 00015000 03:06 11911179 /lib64/ld-2.3.4.so 2aaaaabc1000-2aaaaace9000 r-xp 00000000 03:06 11911311 /lib64/tls/libc-2.3.4.so 2aaaaace9000-2aaaaade8000 ---p 00128000 03:06 11911311 /lib64/tls/libc-2.3.4.so 2aaaaade8000-2aaaaadeb000 r--p 00127000 03:06 11911311 /lib64/tls/libc-2.3.4.so 2aaaaadeb000-2aaaaadee000 rw-p 0012a000 03:06 11911311 /lib64/tls/libc-2.3.4.so 2aaaaadee000-2aaaaadf4000 rw-p 2aaaaadee000 03:06 11911311 7fffffb06000-7fffffb1c000 rw-p 7fffffb06000 03:06 11911311 [stack] [bongani@bongani64 kdelibs]$ gdb /bin/sed -c core.23074 GNU gdb 6.3-3.1.102mdk (Mandrakelinux) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "x86_64-mandrake-linux-gnu"...Using host libthread_db library "/lib64/tls/libthread_db.so.1". Core was generated by `/bin/sed -e 1s/^X// -e s%/[^/]*$%%'. Program terminated with signal 11, Segmentation fault. Reading symbols from /lib64/tls/libc.so.6...done. Loaded symbols for /lib64/tls/libc.so.6 Reading symbols from /lib64/ld-linux-x86-64.so.2...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 #0 add1_buffer (b=0x554e4728203a4343, c=49) at utils.c:503 503 if (b->allocated - b->length < 1) (gdb) bt #0 add1_buffer (b=0x554e4728203a4343, c=49) at utils.c:503 #1 0x00000000004028d9 in add_then_next (b=0x554e4728203a4343, ch=49) at compile.c:300 #2 0x0000000000403676 in read_text (buf=0x0, leadin_ch=49) at compile.c:904 #3 0x0000000000404129 in compile_program (vector=0x20322e3031207875) at compile.c:1036 #4 0x000000000040495a in compile_string (cur_program=0x554e4728203a4343, str=0x31 <Address 0x31 out of bounds>, len=1) at compile.c:1568 #5 0x00000000004025e1 in main (argc=5, argv=0x7fffffb18738) at sed.c:212 (gdb) static void read_text P_((struct text_buf *buf, int leadin_ch)); static void read_text(buf, leadin_ch) struct text_buf *buf; int leadin_ch; { int ch; /* Should we start afresh (as opposed to continue a partial text)? */ if (buf) { if (pending_text) free_buffer(pending_text); pending_text = init_buffer(); buf->text = NULL; buf->text_length = 0; old_text_buf = buf; } /* assert(old_text_buf != NULL); */ if (leadin_ch == EOF) return; if (leadin_ch != '\n') add1_buffer(pending_text, leadin_ch); ch = inchar(); while (ch != EOF && ch != '\n') { if (ch == '\\') { ch = inchar(); if (ch != EOF) add1_buffer (pending_text, '\\'); } if (ch == EOF) { add1_buffer (pending_text, '\n'); return; } ch = add_then_next (pending_text, ch); <------- compile.c:904 -------- sed.c ------------------- int main(argc, argv) int argc; char **argv; { #ifdef REG_PERL #define SHORTOPTS "snrRue:f:l:i::V:" #else #define SHORTOPTS "snrue:f:l:i::V:" #endif static struct option longopts[] = { {"regexp-extended", 0, NULL, 'r'}, #ifdef REG_PERL {"regexp-perl", 0, NULL, 'R'}, #endif {"expression", 1, NULL, 'e'}, {"file", 1, NULL, 'f'}, {"in-place", 2, NULL, 'i'}, {"line-length", 1, NULL, 'l'}, {"quiet", 0, NULL, 'n'}, {"posix", 0, NULL, 'p'}, {"silent", 0, NULL, 'n'}, {"separate", 0, NULL, 's'}, {"unbuffered", 0, NULL, 'u'}, {"version", 0, NULL, 'v'}, {"help", 0, NULL, 'h'}, {NULL, 0, NULL, 0} }; int opt; int return_code; const char *cols = getenv("COLS"); initialize_main (&argc, &argv); #if HAVE_SETLOCALE /* Set locale according to user's wishes. */ setlocale (LC_ALL, ""); #endif initialize_mbcs (); #if ENABLE_NLS /* Tell program which translations to use and where to find. */ bindtextdomain (PACKAGE, LOCALEDIR); textdomain (PACKAGE); #endif if (getenv("POSIXLY_CORRECT") != NULL) posixicity = POSIXLY_CORRECT; else posixicity = POSIXLY_EXTENDED; /* If environment variable `COLS' is set, use its value for the baseline setting of `lcmd_out_line_len'. The "-1" is to avoid gratuitous auto-line-wrap on ttys. */ if (cols) { countT t = ATOI(cols); if (t > 1) lcmd_out_line_len = t-1; } myname = *argv; while ((opt = getopt_long(argc, argv, SHORTOPTS, longopts, NULL)) != EOF) { switch (opt) { case 'n': no_default_output = true; break; case 'e': the_program = compile_string(the_program, optarg, strlen(optarg)); <--sed.c:212 -------- compile.c ---------- /* `str' is a string (from the command line) that contains a sed command. Compile the command, and add it to the end of `cur_program'. */ struct vector * compile_string(cur_program, str, len) struct vector *cur_program; char *str; size_t len; { static countT string_expr_count = 0; struct vector *ret; prog.file = NULL; prog.base = CAST(unsigned char *)str; prog.cur = prog.base; prog.end = prog.cur + len; cur_input.line = 0; cur_input.name = NULL; cur_input.string_expr_count = ++string_expr_count; ret = compile_program(cur_program); <----- compile.c:1568 prog.base = NULL; prog.cur = NULL; prog.end = NULL; first_script = false; return ret; } /* Read a program (or a subprogram within `{' `}' pairs) in and store the compiled form in `*vector'. Return a pointer to the new vector. */ static struct vector *compile_program P_((struct vector *)); static struct vector * compile_program(vector) struct vector *vector; { struct sed_cmd *cur_cmd; struct buffer *b; int ch; if (!vector) { vector = MALLOC(1, struct vector); vector->v = NULL; vector->v_allocated = 0; vector->v_length = 0; obstack_init (&obs); } if (pending_text) read_text(NULL, '\n'); <------------ compile.c:1036 static int add_then_next P_((struct buffer *b, int ch)); static int add_then_next(b, ch) struct buffer *b; int ch; { add1_buffer(b, ch); <------------- compile.c:300 return inchar(); } -------- utils.c ------------ char * add1_buffer(b, c) struct buffer *b; int c; { /* This special case should be kept cheap; * don't make it just a mere convenience * wrapper for add_buffer() -- even "builtin" * versions of memcpy(a, b, 1) can become * expensive when called too often. */ if (c != EOF) { char *result; if (b->allocated - b->length < 1) <--- utils.c:503 resize_buffer(b, b->length+1); result = b->b + b->length++; *result = c; return result; } return NULL; } On Sat, Sep 03, 2005 at 04:20:48AM -0700, bugme-daemon@kernel-bugs.osdl.org wrote: > Pid: 23074, comm: sed Tainted: G M 2.6.13-rc5 you had a machine check first... what is that about? Does this happen as well without machine check dor you? That happens everytime I boot up (the Machine check)... about 4 seconds after the system is fully loaded. I don't know what it means though. I tried to search for it on google but nothing came up... It been happening for a long time. On Sat, 3 Sep 2005, bonganilinux@mweb.co.za wrote: > That happens everytime I boot up (the Machine check)... about 4 seconds > after the system is fully loaded. I don't know what it means though. I > tried to search for it on google but nothing came up... It been > happening for a long time. Install and run mcelog after you see one of these. Chris MCE log output [root@bongani64 linux-2.6]# mcelog --k8 MCE 0 CPU 0 1 instruction cache from boot or resume ADDR ff3fffffff3ffdaf Instruction cache ECC error bit32 = err cpu0 bit33 = err cpu1 bit34 = res2 bit35 = res3 bit39 = res7 bit40 = error found by scrub bit41 = res9 bit42 = res10 bit43 = res11 bit44 = res12 bit45 = uncorrected ecc error bit46 = corrected ecc error bit57 = processor context corrupt bit59 = misc error valid bit61 = error uncorrected bit62 = error overflow (multiple errors) STATUS fe37ffbfff3fffff MCGSTATUS 0 MCE 1 CPU 0 2 bus unit from boot or resume L2 cache ECC error Bus or cache array error bit46 = corrected ecc error bit61 = error uncorrected bit62 = error overflow (multiple errors) STATUS f00040000000c8ff MCGSTATUS 0 MCE 2 CPU 0 3 load/store unit from boot or resume MISC 8005003b8005003b ADDR ffb6fdfbff bit59 = misc error valid bit61 = error uncorrected STATUS bc0000000000c843 MCGSTATUS 0 MCE 3 CPU 1 0 data cache from boot or resume ADDR 7e5041006cc Data cache ECC error (syndrome f2) found by scrubber bit40 = error found by scrub bit45 = uncorrected ecc error bit46 = corrected ecc error bit57 = processor context corrupt bit59 = misc error valid bit61 = error uncorrected bit62 = error overflow (multiple errors) STATUS fe796100000002ea MCGSTATUS 0 MCE 4 CPU 1 1 instruction cache from boot or resume ADDR fbffcff8ffffffff Instruction cache ECC error bit32 = err cpu0 bit33 = err cpu1 bit34 = res2 bit35 = res3 bit39 = res7 bit40 = error found by scrub bit41 = res9 bit42 = res10 bit43 = res11 bit44 = res12 bit45 = uncorrected ecc error bit46 = corrected ecc error bit55 = res23 bit56 = res24 bit57 = processor context corrupt bit59 = misc error valid bit61 = error uncorrected bit62 = error overflow (multiple errors) STATUS ffffffffffffffff MCGSTATUS 0 MCE 5 CPU 1 2 bus unit from boot or resume ADDR d3f9ffe98b L2 cache ECC error Cache tag array error bit46 = corrected ecc error bit57 = processor context corrupt bit61 = error uncorrected STATUS a600400000033dbe MCGSTATUS 0 MCE 6 CPU 1 3 load/store unit from boot or resume MISC 8005003b8005003b ADDR ac86a04594 bit57 = processor context corrupt bit59 = misc error valid bit61 = error uncorrected bit62 = error overflow (multiple errors) STATUS fe0000000000dccc MCGSTATUS 0 Ok; some recap, and some new results: System: Tyan K8W dual opteron, 64bit NUMA kernels, 2Gb/cpu. Symptoms: Random userland crashes during build of gcc-3.3 All (vanilla, untainted) kernels <= 2.6.11.12 are entirely reliable I have now tested a vanilla 2.6.11.12 kernel, modified ONLY with Arjans randomize_va_space patches. Results: With /proc/sys/kernel/randomize_va_space = 1: Two failures on two separate gcc builds; Sep 4 21:55:19 [kernel] cc1[15562] general protection rip:2aaaaaaac244 rsp:7ffffffbd990 error:0 Sep 5 09:04:12 [kernel] cc1[23242] general protection rip:2aaaaaaac244 rsp:7fffffdbd390 error:0 With /proc/sys/kernel/randomize_va_space = 0: No failures. This after 6 full gcc builds and a complete distro build from source. (Heretix allows distro builds with parallel package builds as well as parallel make jobs; --build-jobs=5 -make-jobs=4 is a very effect stress test) CONCLUSIONS 1) 4 level pagetables are not responsible for this bug since they are not in this kernel 2) As suspected, randomize_va_space is either directly causing the symptoms, or exposing a pre-existing problem I'm happy to test further, or make the test machine available via ssh. actually, 4level page tables went into 2.6.11, not later and were enabled on amd64. on another note, was anyone able to perform the suggested tests in comment #63? My bad. Previous comments lead me to believe 4level page tables went in during 2.6.12.*. However, if 4level page tables are indeed in 2.6.11.12, which has been reliable on my systems since its release, I guess the conclusions still stand. As to the tests in #63; I'd be happy to apply and test any patches, but I don't have enough (any) kernel programming knowledge to generate said patch. Sorry for the big post again. I used this patch to test point 5 of comment #63. This is the lifetime of a gcc call that eventualy GPFs. I hope this is helpful --- mm/memory.c.orig 2005-09-06 20:38:44.000000000 +0200 +++ mm/memory.c 2005-09-06 22:49:58.000000000 +0200 @@ -1800,6 +1800,25 @@ return VM_FAULT_OOM; } +static void print_params(struct mm_struct *mm, unsigned long address, + pte_t page_table, pte_t entry) { + struct task_struct *tsk = current; + if(!strcmp("gcc", tsk->comm) || !strcmp("rm", tsk->comm)) { + + printk(KERN_INFO "print_params: %s[%d]\n", tsk->comm, tsk->pid); + printk(KERN_INFO "mm_struct: start_code=%lx, end_code=%lx", + mm->start_code, mm->end_code); + printk(KERN_INFO "start_data=%lx, end_data=%lx, start_brk=%lx", + mm->start_data, mm->end_data, mm->start_brk); + printk(KERN_INFO "brk=%lx, start_stack=%lx, arg_start=%lx", + mm->brk, mm->start_stack, mm->arg_start); + printk(KERN_INFO "arg_end=%lx, env_start=%lx, env_end=%lx\n", + mm->arg_end, mm->env_start, mm->env_end); + printk(KERN_INFO "page_table=%lx, address=%lx, entry=%lx\n", + pte_val(page_table), address, pte_val(entry)); + } +} + /* * do_no_page() tries to create a new page mapping. It aggressively * tries to share with existing pages, but makes a separate copy if @@ -1901,6 +1920,7 @@ entry = mk_pte(new_page, vma->vm_page_prot); if (write_access) entry = maybe_mkwrite(pte_mkdirty(entry), vma); + print_params(mm, address, *page_table, entry); set_pte_at(mm, address, page_table, entry); if (anon) { lru_cache_add_active(new_page); print_params: gcc[17689] mm_struct: start_code=0 end_code=0 start_data=0 end_data=0 start_brk=518000 brk=518000 start_stack=7fffffb17df4 arg_start=7fffffb17df4 arg_end=0 env_start=0 env_end=0 page_table=0 address=5178c8 entry=800000003b5a4067 print_params: gcc[17689] mm_struct: start_code=0 end_code=0 start_data=0 end_data=0 start_brk=518000 brk=518000 start_stack=7fffffb17df4 arg_start=7fffffb17df4 arg_end=0 env_start=0 env_end=0 page_table=0 address=2aaaaabc0868 entry=8000000063dc6067 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaaaaba80 entry=7fa2d025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaaaac130 entry=7fa2e025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaabbffb8 entry=800000007f9e8025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaaab27b0 entry=7fa34025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaaab8a60 entry=7ee6d025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaaabb894 entry=7f0ae025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaaaba1c0 entry=7f0a7025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaaab9970 entry=7f0a6025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaaab3360 entry=7fe9c025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=400040 entry=6a284025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaaaad005 entry=7fa2f025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=516de0 entry=8000000069d42025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaaaaf540 entry=7fa31025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaaabc374 entry=7f0af025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaaab4750 entry=7f9c1025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaaab5a00 entry=7f9c6025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaaab1190 entry=7fa33025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaaab69c0 entry=7fe9e025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaaac0000 entry=800000007eea1025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaaac42e8 entry=800000007eea5025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaaac8624 entry=800000007eea9025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaaad4725 entry=800000007eec5025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaaaca7b4 entry=800000007eeab025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaaad8559 entry=800000007eec9025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaaacb87c entry=800000007eeac025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaaada488 entry=800000007eecb025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaaad9426 entry=800000007eeca025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaaab08f0 entry=7fa32025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaaab7ce0 entry=7f060025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaadeda18 entry=8000000040c26067 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaadebaa0 entry=800000007eefb025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaabc1290 entry=7eee5025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaabd405c entry=7ef1d025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaabd5058 entry=7ef1e025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaade8f10 entry=800000004cb37067 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaade9000 entry=80000000420f3067 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaabd6000 entry=7ef1f025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaabd7000 entry=7ef20025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaadea000 entry=800000004425b067 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaabd8008 entry=7ef21025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaabd9000 entry=7ef22025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaabda000 entry=7ef23025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaadec1e0 entry=8000000034bf2067 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaabdb008 entry=7ef24025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaabdc000 entry=7ef25025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaabcc8f4 entry=7ef15025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaabd2e9c entry=7ef1b025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaabce878 entry=7ef17025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaabd3b71 entry=7ef1c025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaabc37b0 entry=7eee7025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaabc62dc entry=7eeea025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaabd0457 entry=7ef19025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaabcd060 entry=7ef16025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaabcbab8 entry=7ef14025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaabc83c8 entry=7eeec025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaabd11cd entry=7ef1a025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaabc26e8 entry=7eee6025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaabcf40e entry=7ef18025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaabca408 entry=7ef13025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaabc4ea0 entry=7eee8025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaabc72d0 entry=7eeeb025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaabc9640 entry=7ef12025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaabc5004 entry=7eee9025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaabdd000 entry=7ef26025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=401010 entry=6a285025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaac851b0 entry=7ef3d025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaac30c90 entry=7ef5d025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaabf1c60 entry=7ef7a025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=403290 entry=6a287025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaabeefa0 entry=7ef77025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=40cc90 entry=6a100025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=402dd0 entry=6a286025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaabe6690 entry=7ef6f025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=413de6 entry=6a107025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaacc533a entry=7e883025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaaccd692 entry=7e88b025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaacb8d60 entry=7e876025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaabe8860 entry=7ef71025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaac296b0 entry=7ef56025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaabe9710 entry=7ef72025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=4157ab entry=69d41025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaac2b6e0 entry=7ef58025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaac2c7a0 entry=7ef59025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaac7e2f0 entry=7ef36025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaacb7290 entry=7e875025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaac90a90 entry=7ef48025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaac28850 entry=7ef55025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaac2a000 entry=7ef57025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaac2e520 entry=7ef5b025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=518000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaac7db00 entry=7ef35025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=539000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaac32200 entry=7ef5f025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=539000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaabec300 entry=7ef75025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=539000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaabef050 entry=7ef78025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=539000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaac31b00 entry=7ef5e025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=539000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaac2fb80 entry=7ef5c025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=539000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=4149c0 entry=69d40025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=539000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=40d638 entry=6a101025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=539000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=405040 entry=6a289025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=539000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaabfaaa0 entry=7ef83025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=539000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=4043d0 entry=6a288025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=539000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=40ef40 entry=6a102025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=539000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=40f002 entry=6a103025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=539000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaac81e70 entry=7ef39025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=539000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=406b14 entry=6a28a025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=539000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=407654 entry=6a28b025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=539000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaac08b20 entry=7e8d4025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=539000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaac002e0 entry=7ef89025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=539000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaac19890 entry=7e8a3025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=539000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaac268c0 entry=7ef53025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=539000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaacc730c entry=7e885025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=539000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaac012d8 entry=7e8cd025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=539000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaac0391f entry=7e8cf025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=539000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaac258e0 entry=7ef52025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=539000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaac24fa0 entry=7ef51025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=539000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaac78230 entry=7ef30025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=539000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=411e20 entry=6a105025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=539000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaac83480 entry=7ef3b025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=539000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaabea000 entry=7ef73025 print_params: gcc[17689] mm_struct: start_code=400000 end_code=416db4 start_data=516db8 end_data=5178c8 start_brk=518000 brk=539000 start_stack=7fffffb15990 arg_start=7fffffb17df4 arg_end=7fffffb17e16 env_start=7fffffb17e16 env_end=7fffffb17e9a page_table=0 address=2aaaaac21c90 entry=7ef4e025 gcc[17689] general protection rip:404498 rsp:7fffffb15710 error:0 Modules linked in: rfcomm l2cap bluetooth snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_emu10k1 snd_rawmidi snd_seq_device snd_ac97_codec snd_pcm snd_timer snd_page_alloc snd_util_mem snd_hwdep snd soundcore i2c_isa i2c_viapro usbhid eth1394 tg3 ide_cd cdrom ohci1394 ieee1394 loop nls_iso8859_1 nls_cp437 vfat fat tuner bttv video_buf firmware_class i2c_algo_bit v4l2_common btcx_risc tveeprom i2c_core videodev sata_via libata scsi_mod ehci_hcd uhci_hcd usbcore video thermal processor fan button ac Pid: 17689, comm: gcc Not tainted 2.6.13 RIP: 0033:[phys_startup_64+3163032/2147483392] [phys_startup_64+3163032/2147483392] RIP: 0033:[<0000000000404498>] [<0000000000404498>] RSP: 002b:00007fffffb15710 EFLAGS: 00010202 RAX: 0000000000000000 RBX: 000000000051b6b8 RCX: 000000000051b5d0 RDX: 000000000051b5b0 RSI: 00002aaaaadee808 RDI: 0000000001000000 RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000 R10: 00002aaaaadee620 R11: 0000000001000010 R12: 000000000051b500 R13: 00000000005171e0 R14: 7478657400746f68 R15: 0000000000000001 FS: 00002aaaaadf2b00(0000) GS:ffffffff80575880(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00002aaaaac21c90 CR3: 000000004887e000 CR4: 00000000000006e0 41 c7 06 00 00 00 00 eb c4 66 66 66 90 66 66 66 90 66 66 66 /proc/$$/maps: 00400000-00417000 r-xp 00000000 03:06 7375558 /usr/bin/gcc-3.4.3 00516000-00518000 rw-p 00016000 03:06 7375558 /usr/bin/gcc-3.4.3 00518000-00539000 rw-p 00518000 03:06 7375558 [heap] 2aaaaaaab000-2aaaaaabf000 r-xp 00000000 03:06 11911179 /lib64/ld-2.3.4.so 2aaaaaabf000-2aaaaaac0000 rw-p 2aaaaaabf000 03:06 11911179 2aaaaabbf000-2aaaaabc0000 r--p 00014000 03:06 11911179 /lib64/ld-2.3.4.so 2aaaaabc0000-2aaaaabc1000 rw-p 00015000 03:06 11911179 /lib64/ld-2.3.4.so 2aaaaabc1000-2aaaaace9000 r-xp 00000000 03:06 11911311 /lib64/tls/libc-2.3.4.so 2aaaaace9000-2aaaaade8000 ---p 00128000 03:06 11911311 /lib64/tls/libc-2.3.4.so 2aaaaade8000-2aaaaadeb000 r--p 00127000 03:06 11911311 /lib64/tls/libc-2.3.4.so 2aaaaadeb000-2aaaaadee000 rw-p 0012a000 03:06 11911311 /lib64/tls/libc-2.3.4.so 2aaaaadee000-2aaaaadf4000 rw-p 2aaaaadee000 03:06 11911311 Hi Still testing comment #63. With this patch (add local tbl flush into do_no_page) diff -uNpr mm/memory.c.orig mm/memory.c --- mm/memory.c.orig 2005-09-07 18:40:08.000000000 +0200 +++ mm/memory.c 2005-09-07 18:40:29.000000000 +0200 @@ -1790,6 +1790,8 @@ do_anonymous_page(struct mm_struct *mm, set_pte_at(mm, addr, page_table, entry); pte_unmap(page_table); + flush_tlb(); + /* No need to invalidate - it was non-present before */ update_mmu_cache(vma, addr, entry); lazy_mmu_prot_update(entry) I still get these: (I'll test adding wbinvd) grep[10532]: segfault at 000000000000000e rip 000000000040da42 rsp 00007fffffd12be0 error 4 sed[12834] general protection rip:40870a rsp:7fffffd189a0 error:0 Modules linked in: rfcomm l2cap bluetooth snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_emu10k1 snd_rawmidi snd_seq_device snd_ac97_codec snd_pcm snd_timer snd_page_alloc snd_util_mem snd_hwdep snd soundcore i2c_isa i2c_viapro usbhid eth1394 tg3 ide_cd cdrom ohci1394 ieee1394 loop nls_iso8859_1 nls_cp437 vfat fat tuner bttv video_buf firmware_class i2c_algo_bit v4l2_common btcx_risc tveeprom i2c_core videodev sata_via libata scsi_mod ehci_hcd uhci_hcd usbcore video thermal processor fan button ac Pid: 12834, comm: sed Not tainted 2.6.13 RIP: 0033:[<000000000040870a>] [<000000000040870a>] RSP: 002b:00007fffffd189a0 EFLAGS: 00010213 RAX: 0000000000000000 RBX: 554e4728203a4343 RCX: 00000000002f2f58 RDX: 0000000000000001 RSI: 0000000000000031 RDI: 554e4728203a4343 RBP: 0000000000000031 R08: fefefefefefefeff R09: 6565656565656565 R10: 00002aaaaabc09e8 R11: 00002aaaaac30880 R12: 0000000000000005 R13: 00007fffffd18bc0 R14: 20322e3031207875 R15: 0000000000000000 FS: 00002aaaaadf2b00(0000) GS:ffffffff80575800(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000000004086f0 CR3: 00000000164ae000 CR4: 00000000000006e0 48 8b 47 08 48 39 07 74 23 48 8b 53 10 48 01 c2 48 ff c0 48 /proc/$$/maps: 00400000-0041b000 r-xp 00000000 03:06 5669023 /bin/sed 0051a000-0051b000 rw-p 0001a000 03:06 5669023 /bin/sed 0051b000-00544000 rw-p 0051b000 03:06 5669023 [heap] 2aaaaaaab000-2aaaaaabf000 r-xp 00000000 03:06 11911179 /lib64/ld-2.3.4.so 2aaaaaabf000-2aaaaaac0000 rw-p 2aaaaaabf000 03:06 11911179 2aaaaaac0000-2aaaaaac1000 r--p 00000000 03:06 7389976 /usr/share/locale/en_US/LC_TELEPHONE 2aaaaaac1000-2aaaaaac2000 r--p 00000000 03:06 7389937 /usr/share/locale/en_US/LC_ADDRESS 2aaaaaac2000-2aaaaaac3000 r--p 00000000 03:06 7389986 /usr/share/locale/en_US/LC_PAPER 2aaaaaac3000-2aaaaaac4000 r--p 00000000 03:06 7390161 /usr/share/locale/en_US/LC_MONETARY 2aaaaaac4000-2aaaaaac5000 r--p 00000000 03:06 7389930 /usr/share/locale/en_US/LC_NUMERIC 2aaaaabbf000-2aaaaabc0000 r--p 00014000 03:06 11911179 /lib64/ld-2.3.4.so 2aaaaabc0000-2aaaaabc1000 rw-p 00015000 03:06 11911179 /lib64/ld-2.3.4.so 2aaaaabc1000-2aaaaace9000 r-xp 00000000 03:06 11911311 /lib64/tls/libc-2.3.4.so 2aaaaace9000-2aaaaade8000 ---p 00128000 03:06 11911311 /lib64/tls/libc-2.3.4.so 2aaaaade8000-2aaaaadeb000 r--p 00127000 03:06 11911311 /lib64/tls/libc-2.3.4.so 2aaaaadeb000-2aaaaadee000 rw-p 0012a000 03:06 11911311 /lib64/tls/libc-2.3.4.so 2aaaaadee000-2aaaaadf4000 rw-p 2aaaaadee000 03:06 11911311 sed[27540] general protection rip:40870a rsp:7fffffb188d0 error:0 Modules linked in: rfcomm l2cap bluetooth snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_emu10k1 snd_rawmidi snd_seq_device snd_ac97_codec snd_pcm snd_timer snd_page_alloc snd_util_mem snd_hwdep snd soundcore i2c_isa i2c_viapro usbhid eth1394 tg3 ide_cd cdrom ohci1394 ieee1394 loop nls_iso8859_1 nls_cp437 vfat fat tuner bttv video_buf firmware_class i2c_algo_bit v4l2_common btcx_risc tveeprom i2c_core videodev sata_via libata scsi_mod ehci_hcd uhci_hcd usbcore video thermal processor fan button ac Pid: 27540, comm: sed Not tainted 2.6.13 RIP: 0033:[<000000000040870a>] [<000000000040870a>] RSP: 002b:00007fffffb188d0 EFLAGS: 00010213 RAX: 0000000000000000 RBX: 554e4728203a4343 RCX: 7300652d002f2f58 RDX: 0000000000000001 RSI: 0000000000000031 RDI: 554e4728203a4343 RBP: 0000000000000031 R08: fefefefefefefeff R09: 6565656565656565 R10: 00002aaaaabc09e8 R11: 00002aaaaac30880 R12: 0000000000000005 R13: 00007fffffb18af0 R14: 20322e3031207875 R15: 0000000000000000 FS: 00002aaaaadf2b00(0000) GS:ffffffff80575800(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000000005af24c CR3: 00000000449dc000 CR4: 00000000000006e0 48 8b 47 08 48 39 07 74 23 48 8b 53 10 48 01 c2 48 ff c0 48 /proc/$$/maps: 00400000-0041b000 r-xp 00000000 03:06 5669023 /bin/sed 0051a000-0051b000 rw-p 0001a000 03:06 5669023 /bin/sed 0051b000-00544000 rw-p 0051b000 03:06 5669023 [heap] 2aaaaaaab000-2aaaaaabf000 r-xp 00000000 03:06 11911179 /lib64/ld-2.3.4.so 2aaaaaabf000-2aaaaaac0000 rw-p 2aaaaaabf000 03:06 11911179 2aaaaaac0000-2aaaaaac1000 r--p 00000000 03:06 7389976 /usr/share/locale/en_US/LC_TELEPHONE 2aaaaaac1000-2aaaaaac2000 r--p 00000000 03:06 7389937 /usr/share/locale/en_US/LC_ADDRESS 2aaaaaac2000-2aaaaaac3000 r--p 00000000 03:06 7389986 /usr/share/locale/en_US/LC_PAPER 2aaaaaac3000-2aaaaaac4000 r--p 00000000 03:06 7390161 /usr/share/locale/en_US/LC_MONETARY 2aaaaaac4000-2aaaaaac5000 r--p 00000000 03:06 7389930 /usr/share/locale/en_US/LC_NUMERIC 2aaaaabbf000-2aaaaabc0000 r--p 00014000 03:06 11911179 /lib64/ld-2.3.4.so 2aaaaabc0000-2aaaaabc1000 rw-p 00015000 03:06 11911179 /lib64/ld-2.3.4.so 2aaaaabc1000-2aaaaace9000 r-xp 00000000 03:06 11911311 /lib64/tls/libc-2.3.4.so 2aaaaace9000-2aaaaade8000 ---p 00128000 03:06 11911311 /lib64/tls/libc-2.3.4.so 2aaaaade8000-2aaaaadeb000 r--p 00127000 03:06 11911311 /lib64/tls/libc-2.3.4.so 2aaaaadeb000-2aaaaadee000 rw-p 0012a000 03:06 11911311 /lib64/tls/libc-2.3.4.so 2aaaaadee000-2aaaaadf4000 rw-p 2aaaaadee000 03:06 11911311 sed[27933] general protection rip:40870a rsp:7fffffd18f50 error:0 Modules linked in: rfcomm l2cap bluetooth snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_emu10k1 snd_rawmidi snd_seq_device snd_ac97_codec snd_pcm snd_timer snd_page_alloc snd_util_mem snd_hwdep snd soundcore i2c_isa i2c_viapro usbhid eth1394 tg3 ide_cd cdrom ohci1394 ieee1394 loop nls_iso8859_1 nls_cp437 vfat fat tuner bttv video_buf firmware_class i2c_algo_bit v4l2_common btcx_risc tveeprom i2c_core videodev sata_via libata scsi_mod ehci_hcd uhci_hcd usbcore video thermal processor fan button ac Pid: 27933, comm: sed Not tainted 2.6.13 RIP: 0033:[<000000000040870a>] [<000000000040870a>] RSP: 002b:00007fffffd18f50 EFLAGS: 00010213 RAX: 0000000000000000 RBX: 554e4728203a4343 RCX: 7300652d002f2f58 RDX: 0000000000000001 RSI: 0000000000000031 RDI: 554e4728203a4343 RBP: 0000000000000031 R08: fefefefefefefeff R09: 6565656565656565 R10: 00002aaaaabc09e8 R11: 00002aaaaac30880 R12: 0000000000000005 R13: 00007fffffd19170 R14: 20322e3031207875 R15: 0000000000000000 FS: 00002aaaaadf2b00(0000) GS:ffffffff80575800(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000000004086f0 CR3: 00000000603ab000 CR4: 00000000000006e0 48 8b 47 08 48 39 07 74 23 48 8b 53 10 48 01 c2 48 ff c0 48 /proc/$$/maps: 00400000-0041b000 r-xp 00000000 03:06 5669023 /bin/sed 0051a000-0051b000 rw-p 0001a000 03:06 5669023 /bin/sed 0051b000-00544000 rw-p 0051b000 03:06 5669023 [heap] 2aaaaaaab000-2aaaaaabf000 r-xp 00000000 03:06 11911179 /lib64/ld-2.3.4.so 2aaaaaabf000-2aaaaaac0000 rw-p 2aaaaaabf000 03:06 11911179 2aaaaaac0000-2aaaaaac1000 r--p 00000000 03:06 7389976 /usr/share/locale/en_US/LC_TELEPHONE 2aaaaaac1000-2aaaaaac2000 r--p 00000000 03:06 7389937 /usr/share/locale/en_US/LC_ADDRESS 2aaaaaac2000-2aaaaaac3000 r--p 00000000 03:06 7389986 /usr/share/locale/en_US/LC_PAPER 2aaaaaac3000-2aaaaaac4000 r--p 00000000 03:06 7390161 /usr/share/locale/en_US/LC_MONETARY 2aaaaaac4000-2aaaaaac5000 r--p 00000000 03:06 7389930 /usr/share/locale/en_US/LC_NUMERIC 2aaaaabbf000-2aaaaabc0000 r--p 00014000 03:06 11911179 /lib64/ld-2.3.4.so 2aaaaabc0000-2aaaaabc1000 rw-p 00015000 03:06 11911179 /lib64/ld-2.3.4.so 2aaaaabc1000-2aaaaace9000 r-xp 00000000 03:06 11911311 /lib64/tls/libc-2.3.4.so 2aaaaace9000-2aaaaade8000 ---p 00128000 03:06 11911311 /lib64/tls/libc-2.3.4.so 2aaaaade8000-2aaaaadeb000 r--p 00127000 03:06 11911311 /lib64/tls/libc-2.3.4.so 2aaaaadeb000-2aaaaadee000 rw-p 0012a000 03:06 11911311 /lib64/tls/libc-2.3.4.so 2aaaaadee000-2aaaaadf4000 rw-p 2aaaaadee000 03:06 11911311 rm[32530] general protection rip:2aaaaac32260 rsp:7fffffd07f48 error:0 Modules linked in: rfcomm l2cap bluetooth snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_emu10k1 snd_rawmidi snd_seq_device snd_ac97_codec snd_pcm snd_timer snd_page_alloc snd_util_mem snd_hwdep snd soundcore i2c_isa i2c_viapro usbhid eth1394 tg3 ide_cd cdrom ohci1394 ieee1394 loop nls_iso8859_1 nls_cp437 vfat fat tuner bttv video_buf firmware_class i2c_algo_bit v4l2_common btcx_risc tveeprom i2c_core videodev sata_via libata scsi_mod ehci_hcd uhci_hcd usbcore video thermal processor fan button ac Pid: 32530, comm: rm Not tainted 2.6.13 RIP: 0033:[<00002aaaaac32260>] [<00002aaaaac32260>] RSP: 002b:00007fffffd07f48 EFLAGS: 00010287 RAX: 65722e006e79642e RBX: 0000000000000000 RCX: 0000000000000001 RDX: 0000000000000001 RSI: 00007fffffd0a51e RDI: 65722e006e79642e RBP: 00007fffffd07fa0 R08: 00007fffffd07fbc R09: 00007fffffd07fb8 R10: 65722e006e79642e R11: 00002aaaaac32200 R12: 0000000000407280 R13: 00007fffffd081e0 R14: 0000000000000000 R15: 0000000000000000 FS: 00002aaaaadf2b00(0000) GS:ffffffff80575800(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00002aaaaac31c10 CR3: 0000000054b92000 CR4: 00000000000006e0 f3 a4 4c 89 d0 c3 90 90 90 90 90 90 90 90 90 90 48 89 d0 48 /proc/$$/maps: 00400000-0040a000 r-xp 00000000 03:06 5668904 /bin/rm 0050a000-0050b000 rw-p 0000a000 03:06 5668904 /bin/rm 0050b000-0052c000 rw-p 0050b000 03:06 5668904 [heap] 2aaaaaaab000-2aaaaaabf000 r-xp 00000000 03:06 11911179 /lib64/ld-2.3.4.so 2aaaaaabf000-2aaaaaac0000 rw-p 2aaaaaabf000 03:06 11911179 2aaaaabbf000-2aaaaabc0000 r--p 00014000 03:06 11911179 /lib64/ld-2.3.4.so 2aaaaabc0000-2aaaaabc1000 rw-p 00015000 03:06 11911179 /lib64/ld-2.3.4.so 2aaaaabc1000-2aaaaace9000 r-xp 00000000 03:06 11911311 /lib64/tls/libc-2.3.4.so 2aaaaace9000-2aaaaade8000 ---p 00128000 03:06 11911311 /lib64/tls/libc-2.3.4.so 2aaaaade8000-2aaaaadeb000 r--p 00127000 03:06 11911311 /lib64/tls/libc-2.3.4.so 2aaaaadeb000-2aaaaadee000 rw-p 0012a000 03:06 11911311 /lib64/tls/libc-2.3.4.so 2aaaaadee000-2aaaaadf4000 rw-p 2aaaaadee000 03:06 11911311 sed[6922] general protection rip:40870a rsp:7fffffd196d0 error:0 Modules linked in: rfcomm l2cap bluetooth snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_emu10k1 snd_rawmidi snd_seq_device snd_ac97_codec snd_pcm snd_timer snd_page_alloc snd_util_mem snd_hwdep snd soundcore i2c_isa i2c_viapro usbhid eth1394 tg3 ide_cd cdrom ohci1394 ieee1394 loop nls_iso8859_1 nls_cp437 vfat fat tuner bttv video_buf firmware_class i2c_algo_bit v4l2_common btcx_risc tveeprom i2c_core videodev sata_via libata scsi_mod ehci_hcd uhci_hcd usbcore video thermal processor fan button ac Pid: 6922, comm: sed Not tainted 2.6.13 RIP: 0033:[<000000000040870a>] [<000000000040870a>] RSP: 002b:00007fffffd196d0 EFLAGS: 00010213 RAX: 0000000000000000 RBX: 554e4728203a4343 RCX: 000000002d002f2f RDX: 0000000000000001 RSI: 0000000000000031 RDI: 554e4728203a4343 RBP: 0000000000000031 R08: fefefefefefefeff R09: 6565656565656565 R10: 00002aaaaabc09e8 R11: 00002aaaaac30880 R12: 0000000000000005 R13: 00007fffffd198f0 R14: 20322e3031207875 R15: 0000000000000000 FS: 00002aaaaadf2b00(0000) GS:ffffffff80575800(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000000004086f0 CR3: 000000007c113000 CR4: 00000000000006e0 48 8b 47 08 48 39 07 74 23 48 8b 53 10 48 01 c2 48 ff c0 48 /proc/$$/maps: 00400000-0041b000 r-xp 00000000 03:06 5669023 /bin/sed 0051a000-0051b000 rw-p 0001a000 03:06 5669023 /bin/sed 0051b000-00544000 rw-p 0051b000 03:06 5669023 [heap] 2aaaaaaab000-2aaaaaabf000 r-xp 00000000 03:06 11911179 /lib64/ld-2.3.4.so 2aaaaaabf000-2aaaaaac0000 rw-p 2aaaaaabf000 03:06 11911179 2aaaaaac0000-2aaaaaac1000 r--p 00000000 03:06 7389976 /usr/share/locale/en_US/LC_TELEPHONE 2aaaaaac1000-2aaaaaac2000 r--p 00000000 03:06 7389937 /usr/share/locale/en_US/LC_ADDRESS 2aaaaaac2000-2aaaaaac3000 r--p 00000000 03:06 7389986 /usr/share/locale/en_US/LC_PAPER 2aaaaaac3000-2aaaaaac4000 r--p 00000000 03:06 7390161 /usr/share/locale/en_US/LC_MONETARY 2aaaaaac4000-2aaaaaac5000 r--p 00000000 03:06 7389930 /usr/share/locale/en_US/LC_NUMERIC 2aaaaabbf000-2aaaaabc0000 r--p 00014000 03:06 11911179 /lib64/ld-2.3.4.so 2aaaaabc0000-2aaaaabc1000 rw-p 00015000 03:06 11911179 /lib64/ld-2.3.4.so 2aaaaabc1000-2aaaaace9000 r-xp 00000000 03:06 11911311 /lib64/tls/libc-2.3.4.so 2aaaaace9000-2aaaaade8000 ---p 00128000 03:06 11911311 /lib64/tls/libc-2.3.4.so 2aaaaade8000-2aaaaadeb000 r--p 00127000 03:06 11911311 /lib64/tls/libc-2.3.4.so 2aaaaadeb000-2aaaaadee000 rw-p 0012a000 03:06 11911311 /lib64/tls/libc-2.3.4.so 2aaaaadee000-2aaaaadf4000 rw-p 2aaaaadee000 03:06 11911311 Hi Still testing comment #63. With this patch (add an explicit wbinvd call in do_no_page) diff -uNpr mm/memory.c.orig mm/memory.c --- mm/memory.c.orig 2005-09-07 18:40:08.000000000 +0200 +++ mm/memory.c 2005-09-07 19:42:34.000000000 +0200 @@ -1790,6 +1790,8 @@ do_anonymous_page(struct mm_struct *mm, set_pte_at(mm, addr, page_table, entry); pte_unmap(page_table); + wbinvd(); + /* No need to invalidate - it was non-present before */ update_mmu_cache(vma, addr, entry); lazy_mmu_prot_update(entry); and I still get these, so that's it for comment #63... [OT] The Machine Check Exceptions seem to occurr when I boot with a CD inside the CDROM driver or when I have a USB mem stick plugged in. sed[10681] general protection rip:40870a rsp:7fffffd18560 error:0 Modules linked in: rfcomm l2cap bluetooth snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_emu10k1 snd_rawmidi snd_seq_device snd_ac97_codec snd_pcm snd_timer snd_page_alloc snd_util_mem snd_hwdep snd soundcore i2c_isa i2c_viapro usbhid eth1394 tg3 ide_cd cdrom ohci1394 ieee1394 loop nls_iso8859_1 nls_cp437 vfat fat tuner bttv video_buf firmware_class i2c_algo_bit v4l2_common btcx_risc tveeprom i2c_core videodev sata_via libata scsi_mod ehci_hcd uhci_hcd usbcore video thermal processor fan button ac Pid: 10681, comm: sed Not tainted 2.6.13 RIP: 0033:[<000000000040870a>] [<000000000040870a>] RSP: 002b:00007fffffd18560 EFLAGS: 00010217 RAX: 0000000000000000 RBX: 554e4728203a4343 RCX: 2f706f63642f2e00 RDX: 0000000000000001 RSI: 0000000000000032 RDI: 554e4728203a4343 RBP: 0000000000000032 R08: fefefefefefefeff R09: 6565656565656565 R10: 00002aaaaabc09e8 R11: 00002aaaaac30880 R12: 0000000000000004 R13: 00007fffffd18780 R14: 20322e3031207875 R15: 0000000000000000 FS: 00002aaaaadf2b00(0000) GS:ffffffff80575800(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000000004086f0 CR3: 00000000538eb000 CR4: 00000000000006e0 48 8b 47 08 48 39 07 74 23 48 8b 53 10 48 01 c2 48 ff c0 48 /proc/$$/maps: 00400000-0041b000 r-xp 00000000 03:06 5669023 /bin/sed 0051a000-0051b000 rw-p 0001a000 03:06 5669023 /bin/sed 0051b000-00544000 rw-p 0051b000 03:06 5669023 [heap] 2aaaaaaab000-2aaaaaabf000 r-xp 00000000 03:06 11911179 /lib64/ld-2.3.4.so 2aaaaaabf000-2aaaaaac0000 rw-p 2aaaaaabf000 03:06 11911179 2aaaaaac0000-2aaaaaac1000 r--p 00000000 03:06 7389976 /usr/share/locale/en_US/LC_TELEPHONE 2aaaaaac1000-2aaaaaac2000 r--p 00000000 03:06 7389937 /usr/share/locale/en_US/LC_ADDRESS 2aaaaaac2000-2aaaaaac3000 r--p 00000000 03:06 7389986 /usr/share/locale/en_US/LC_PAPER 2aaaaaac3000-2aaaaaac4000 r--p 00000000 03:06 7390161 /usr/share/locale/en_US/LC_MONETARY 2aaaaaac4000-2aaaaaac5000 r--p 00000000 03:06 7389930 /usr/share/locale/en_US/LC_NUMERIC 2aaaaabbf000-2aaaaabc0000 r--p 00014000 03:06 11911179 /lib64/ld-2.3.4.so 2aaaaabc0000-2aaaaabc1000 rw-p 00015000 03:06 11911179 /lib64/ld-2.3.4.so 2aaaaabc1000-2aaaaace9000 r-xp 00000000 03:06 11911311 /lib64/tls/libc-2.3.4.so 2aaaaace9000-2aaaaade8000 ---p 00128000 03:06 11911311 /lib64/tls/libc-2.3.4.so 2aaaaade8000-2aaaaadeb000 r--p 00127000 03:06 11911311 /lib64/tls/libc-2.3.4.so 2aaaaadeb000-2aaaaadee000 rw-p 0012a000 03:06 11911311 /lib64/tls/libc-2.3.4.so 2aaaaadee000-2aaaaadf4000 rw-p 2aaaaadee000 03:06 11911311 gcc[14010] general protection rip:404498 rsp:7fffffb15b20 error:0 Modules linked in: rfcomm l2cap bluetooth snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_emu10k1 snd_rawmidi snd_seq_device snd_ac97_codec snd_pcm snd_timer snd_page_alloc snd_util_mem snd_hwdep snd soundcore i2c_isa i2c_viapro usbhid eth1394 tg3 ide_cd cdrom ohci1394 ieee1394 loop nls_iso8859_1 nls_cp437 vfat fat tuner bttv video_buf firmware_class i2c_algo_bit v4l2_common btcx_risc tveeprom i2c_core videodev sata_via libata scsi_mod ehci_hcd uhci_hcd usbcore video thermal processor fan button ac Pid: 14010, comm: gcc Not tainted 2.6.13 RIP: 0033:[<0000000000404498>] [<0000000000404498>] RSP: 002b:00007fffffb15b20 EFLAGS: 00010202 RAX: 0000000000000000 RBX: 000000000051b6b8 RCX: 000000000051b5d0 RDX: 000000000051b5b0 RSI: 00002aaaaadee808 RDI: 0000000001000000 RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000 R10: 00002aaaaadee620 R11: 0000000001000010 R12: 000000000051b500 R13: 00000000005171e0 R14: 7478657400746f68 R15: 0000000000000001 FS: 00002aaaaadf2b00(0000) GS:ffffffff80575880(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00002aaaaac21c90 CR3: 0000000055058000 CR4: 00000000000006e0 41 c7 06 00 00 00 00 eb c4 66 66 66 90 66 66 66 90 66 66 66 /proc/$$/maps: 00400000-00417000 r-xp 00000000 03:06 7375558 /usr/bin/gcc-3.4.3 00516000-00518000 rw-p 00016000 03:06 7375558 /usr/bin/gcc-3.4.3 00518000-00539000 rw-p 00518000 03:06 7375558 [heap] 2aaaaaaab000-2aaaaaabf000 r-xp 00000000 03:06 11911179 /lib64/ld-2.3.4.so 2aaaaaabf000-2aaaaaac0000 rw-p 2aaaaaabf000 03:06 11911179 2aaaaabbf000-2aaaaabc0000 r--p 00014000 03:06 11911179 /lib64/ld-2.3.4.so 2aaaaabc0000-2aaaaabc1000 rw-p 00015000 03:06 11911179 /lib64/ld-2.3.4.so 2aaaaabc1000-2aaaaace9000 r-xp 00000000 03:06 11911311 /lib64/tls/libc-2.3.4.so 2aaaaace9000-2aaaaade8000 ---p 00128000 03:06 11911311 /lib64/tls/libc-2.3.4.so 2aaaaade8000-2aaaaadeb000 r--p 00127000 03:06 11911311 /lib64/tls/libc-2.3.4.so 2aaaaadeb000-2aaaaadee000 rw-p 0012a000 03:06 11911311 /lib64/tls/libc-2.3.4.so 2aaaaadee000-2aaaaadf4000 rw-p 2aaaaadee000 03:06 11911311 Downstream bug report: http://bugs.gentoo.org/show_bug.cgi?id=104151 (although no useful info to add at this point) This may be stupid, but as far as I can tell, this bug only seems to happen on AMD CPU's. Maybe it's just because Intel's chips aren't that common, or used that much in 64-bit setups, but that just strikes me as unlikely. Maybe it's because it takes some very special timing, and AMD just happens to hit it. But I get the feeling that it is more likely because of some architectural feature. The biggest suspect is the TLB. There's two things that AMD does differently: - the AMD TLB is much bigger, iirc, with the L2 picking up more entries IOW, we may have a TLB flushing bug that wouldn't show with a smaller TLB. - the AMD tlb is reported to be "smarter", and a TLB flush doesn't necessarily flush all entries - it supposedly tracks memory contents for some entries with it's "tlb flush filter". Now, we _know_ there are errata in the SMP tlb flush filter. AMD documents them in their errata sheet (errata 63 and 122: "TLB Flush Filter causes coherency problem in multiprocessor systems"). I dunno. Does this patch (totally untested, may not compile, somebody should check it) make any difference? Linus diff --git a/arch/x86_64/kernel/setup.c b/arch/x86_64/kernel/setup.c --- a/arch/x86_64/kernel/setup.c +++ b/arch/x86_64/kernel/setup.c @@ -831,11 +831,26 @@ static void __init amd_detect_cmp(struct #endif } +#define HWCR 0xc0010015 + static int __init init_amd(struct cpuinfo_x86 *c) { int r; int level; +#if CONFIG_SMP + unsigned long value; + + // Disable TLB flush filter by setting HWCR.FFDIS: + // bit 6 of msr C001_0015 + // + // Errata 63 for SH-B3 steppings + // Errata 122 for all(?) steppings + rdmsrl(HWCR, value); + value |= 1 << 6; + wrmsrl(HWCR, value); +#endif + /* Bit 31 in normal CPUID used for nonstandard 3DNow ID; 3DNow is IDd by bit 31 in extended CPUID (1*32+31) anyway */ clear_bit(0*32+31, &c->x86_capability); On Sat, 2005-09-17 at 11:04 -0700, Linus Torvalds wrote: > This may be stupid, but as far as I can tell, this bug only seems to > happen on AMD CPU's. > > Maybe it's just because Intel's chips aren't that common, or used that > much in 64-bit setups, but that just strikes me as unlikely. > > Maybe it's because it takes some very special timing, and AMD just happens > to hit it. > > But I get the feeling that it is more likely because of some architectural > feature. The biggest suspect is the TLB. There's two things that AMD does > differently: > > - the AMD TLB is much bigger, iirc, with the L2 picking up more entries > > IOW, we may have a TLB flushing bug that wouldn't show with a smaller > TLB. > > - the AMD tlb is reported to be "smarter", and a TLB flush doesn't > necessarily flush all entries - it supposedly tracks memory contents > for some entries with it's "tlb flush filter". > > Now, we _know_ there are errata in the SMP tlb flush filter. AMD documents > them in their errata sheet (errata 63 and 122: "TLB Flush Filter causes > coherency problem in multiprocessor systems"). > > I dunno. Does this patch (totally untested, may not compile, somebody > should check it) make any difference? Everybody seems to want to blame all kinds of bugs on that one, I get asked about it all the time :) It might be worth a try, but merging that particular patch would be a mistake because it doesn't limit the steppings where the flush filter is disabled (so please don't merge it). If anything it should be limited to max E6 stepping, which is known to have this problem. On later CPUs it might have unintended side effects. Also I would wait for feedback from people. Davej had a similar patch in fedora iirc so he might know if the address space randomization problem still happens there. My feeling is that that bug is too easy to hit on some setups so that it could be this particular erratum. -Andi > > Linus > > diff --git a/arch/x86_64/kernel/setup.c b/arch/x86_64/kernel/setup.c > --- a/arch/x86_64/kernel/setup.c > +++ b/arch/x86_64/kernel/setup.c > @@ -831,11 +831,26 @@ static void __init amd_detect_cmp(struct > #endif > } > > +#define HWCR 0xc0010015 > + > static int __init init_amd(struct cpuinfo_x86 *c) > { > int r; > int level; > > +#if CONFIG_SMP > + unsigned long value; > + > + // Disable TLB flush filter by setting HWCR.FFDIS: > + // bit 6 of msr C001_0015 > + // > + // Errata 63 for SH-B3 steppings > + // Errata 122 for all(?) steppings > + rdmsrl(HWCR, value); > + value |= 1 << 6; > + wrmsrl(HWCR, value); > +#endif > + > /* Bit 31 in normal CPUID used for nonstandard 3DNow ID; > 3DNow is IDd by bit 31 in extended CPUID (1*32+31) anyway */ > clear_bit(0*32+31, &c->x86_capability);
On Sat, 17 Sep 2005, Linus Torvalds wrote:
>
> Now, we _know_ there are errata in the SMP tlb flush filter. AMD documents
> them in their errata sheet (errata 63 and 122: "TLB Flush Filter causes
> coherency problem in multiprocessor systems").
Btw, this particular errata would also happen to explain why this bug
only started happening with the 4-level pages.
With the old three-level page tables, mm context switches were done by
just switching the highest level entry in one global page table. That will
necessarily invalidate the whole flush filter, and thus basically disable
it in practice over any context switch.
With the four-level page tables, a MM context switch actually switches the
whole page table, and the flush filter thus remains active over a context
switch. Thus making any flush filter bugs much less likely to trigger.
NOTE! This is still just a theory of mine, based on documentation and some
personal assumptions about how the flush filter works. But it does seem to
make sense. I'd love to hear if the patch makes any difference to
behaviour..
Linus
On Sat, 17 Sep 2005, Andi Kleen wrote:
>
> It might be worth a try, but merging that particular patch would be a
> mistake because it doesn't limit the steppings where the flush filter
> is disabled (so please don't merge it). If anything it should be limited
> to max E6 stepping, which is known to have this problem. On later
> CPUs it might have unintended side effects. Also I would wait
> for feedback from people.
If the flush filter is already disabled, the thing should have no impact,
so I don't see the point in limiting it to anything else. The errata
sheets just say "disable it by setting HWCR.FFDIS", there are no
limitations on it (like some of the other errata fixes that say "only do
this for steppings C0 and higher" or similar).
And yes, maybe it could be limited to the E6 stepping, but basically right
now that means every single CPU out there in the wild.
So from a testing standpoint, the patch looks fine (assuming I didn't
introduce some silly typo or something). From a "let's commit it"
standpoint, I obviously want to hear if it makes any difference, but
considering the pain this has caused for us, if I get even a single report
that it fixes the problem, I _am_ going to commit that fix without any
further questions.
So if we confirm this to fix the problem, and if we then get confirmation
from AMD that future CPU's have that thing fixed, we can limit it _then_.
It still leaves the flush filter on for non-SMP configs. We could allow it
for CONFIG_SMP when only one CPU is found, but I don't think many people
do that (the distributions seem to always have separate UP/SMP kernels, so
at most you might run an SMP kernel on a UP machine at install time).
Linus
I have applied your patch (manually) to my earliest known bad kernel; 2.6.11.12 with Arjans randomize_va_mappings patches. Built, booted and built gcc 3.3.6 with no segfaults/gpfs, which is encouraging since it rarely succeeded before. Currently building gcc 3.3.6 and gcc 4.0.x both at -j4. If that succeeds then I think you might have solved it. Will report back in a few minutes with progress report. [This is on a Tyan K8W with dual 250 opterons and 2Gb/cpu] All builds completed; no problems at all, so its now looking version likely indeed that you have 'put your finger on it', so to speak. I'll build a complete distro overnight to really hammer it, and I'll patch up the latest kernel version tomorrow and confirm that it is also OK $ uname -r -v -m 2.6.11.12-debug #1 SMP Sat Sep 17 21:32:21 BST 2005 x86_64 $ cat /proc/sys/kernel/randomize_va_space 1 Questions: How much will this patch cripple my dual opteron machines? Is this something that could be fixed up with microcode/bios upgrades? Andrew On Sat, 17 Sep 2005 bugme-daemon@kernel-bugs.osdl.org wrote: > > All builds completed; no problems at all, so its now looking version likely > indeed that you have 'put your finger on it', so to speak. Goodie. > I'll build a complete distro overnight to really hammer it, and I'll patch up > the latest kernel version tomorrow and confirm that it is also OK Thanks. > Questions: > > How much will this patch cripple my dual opteron machines? > Is this something that could be fixed up with microcode/bios upgrades? It's not likely to be a huge performance hit. As mentioned, earlier kernels had effectively disabled this anyway for a large number of TLB invalidates, since they wrote the the top-level page directory on task switch, and task switching is the most likely thing to actually get helped by this. But the cost of TLB re-loads due to TLB invalidates is negligible under most loads - usually you'd not switch back and forth that much. So yes, it will hit a few loads, and I bet you could benchmark it to see the effect, but I don't think you'll likely see it in any noticeable ways. And no, a ucode/bios upgrade is unlikely to fix it. A BIOS upgrade might have _hidden_ the bug (by having the BIOS do the same workaround that the patch does), but the bug itself will almost certainly need a new CPU mask revision to fix. I doubt any of this is really microcode: there's some custom hardware to do TLB invalidate tracking on cache dirty and invalidate time, and there's likely some case they just missed. It's a bit sad, since it's a really clever feature and I like it, but it will get fixed eventually and then we can remove the workaround. Linus A patched up 2.6.13.1 also seems fine. I've left a big distro build job running but dont expect any problems given results so far. I'd suggest an early stable branch release with this patch (or a modified version; Andi?). Great work - Thanks! Andrew Reply-To: davej@redhat.com On Sat, Sep 17, 2005 at 08:46:07PM +0200, Andi Kleen wrote: > Davej had a similar patch in fedora iirc so he might know if the address > space randomization problem still happens there. My feeling is that > that bug is too easy to hit on some setups so that it could be this > particular erratum. Initial test results look positive. Before, people were seeing all sorts of strange things ranging from bad pmd's, to bad swap entry msgs. One of the Fedora users affected byt this problem came up with a userspace program that poked /dev/cpu/*/msr, which did the trick for him. As an experiment I merged a similar patch to our kernel, which also seems to have fixed things for those affected by this issue before. https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=155857 https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=164941 Dave With that setup TLB patch, I observed no segfaults on my dual Opteron while it was under heavy load for several hours, compiling with -j4 and playing games under cedega, and randomize_va_space set to 1. Average load during this period ranged between 4.5 and 5.0. Previously I would get crashes in my syslog with any kernel above 2.6.11 unless I did "echo 0 > /proc/sys/kernel/randomize_va_space". My system: kernel 2.6.13-ck5 + setup TLB patchlet MSI K8T Master2 FAR, 2x244 Opterons, 1GB registered DDR400 Gentoo 2005.1 with glibc amd64 speedup patches (glibc_overlay) Actually BIOS updates are supposed to fix it - according to AMD the BIOS are supposed to disable the flush filter on affected revisions. Unfortunately quite a lot of Taiwanese vendors are not very good in adding errata workarounds... The problem is that I don't want to disable it for all revisions because it could cause problems on future CPUs. I guess it would be ok if it's limited in the steppings (only upto E7) just to round out the story: http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/33340.pdf Hi Linus, Your patch fixes the problem for me too. I've been testing it for whole day Thanx Linus, I also think your HWCR.FFDIS catch is on target. After doing a BIOS update on my Tyan K8SE motherboard I found that this bug stopped happening. Rather than use your patch to set HWCR.FFDIS in init_amd() I added some code to print out the current setting and sure enough, it is already set - presumably by the new BIOS. I am unable to confirm that the old BIOS did not set it because I don't want to revert, but it seems like a safe bet it didn't. I am still getting segfaults in my log after applying the HWCR.FFDIS patch, but I have not got any of the general protection faults: [ 5907.301418] flipflop[11381]: segfault at 0000000000000000 rip 00002aaaaac2082e rsp 00007fffffce84d0 error 6 [ 7109.144800] lament[11443]: segfault at 0000000000000000 rip 00002aaaab5ac82e rsp 00007fffffd23780 error 6 I think that the segfaults might be another issue, maybe just the screensaver not behaving? Now that 2.6.14 is finally here, I'm closing this bug report *** Bug 4991 has been marked as a duplicate of this bug. *** |