Bug 10788

Summary: the kernel makes openoffice,vlc,mplayer,kaffeine segfault
Product: Platform Specific/Hardware Reporter: GNUtoo (GNUtoo)
Component: i386Assignee: platform_i386
Status: CLOSED PATCH_ALREADY_AVAILABLE    
Severity: high CC: akpm, bunk, hughd, mingo, randy.dunlap, suresh.b.siddha, torvalds, venki
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.26-rc3 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 10492    
Attachments: my .config of the kenrel
009-allow-ap-vlan-modes.patch
2.6.26-rc3-dmesg
Can you try whether _reverting_ this patch fixes it?

Description GNUtoo 2008-05-24 14:23:13 UTC
Latest working kernel version:2.6.25
Earliest failing kernel version:2.6.26-rc3
Distribution:Gentoo
Hardware Environment:
Software Environment:
Problem Description:just after the kenrel update(that made my ath5k wireless card work) all the programs mentioned here stopped working...
mabe i should try 2.6.26-rc2...


Steps to reproduce:
run openoffice,vlc,mplayer,kaffeine(see a youtube video with the players)

here the end of strace of vlc:
open("/usr/lib/libavutil.so.49", O_RDONLY) = 6
read(6, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0p\24\0\0004\0\0\0"..., 512) = 512
fstat64(6, {st_mode=S_IFREG|0755, st_size=45958, ...}) = 0
mmap2(NULL, 41680, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 6, 0) = 0xb6933000
mmap2(0xb693a000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 6, 0x6) = 0xb693a000
mmap2(0xb693c000, 4816, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb693c000
close(6)                                = 0
open("/usr/lib/libvorbis.so.0", O_RDONLY) = 6
read(6, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0P*\0\0004\0\0\0"..., 512) = 512
fstat64(6, {st_mode=S_IFREG|0755, st_size=164132, ...}) = 0
mmap2(NULL, 162996, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 6, 0) = 0xb690b000
mmap2(0xb6925000, 57344, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 6, 0x1a) = 0xb6925000
close(6)                                = 0
mprotect(0xb693a000, 4096, PROT_READ)   = 0
mprotect(0xb6a73000, 4096, PROT_READ)   = 0
mprotect(0xb6a85000, 4096, PROT_READ)   = 0
mprotect(0xb6a9a000, 626688, PROT_READ|PROT_WRITE) = 0
mprotect(0xb6a9a000, 626688, PROT_READ|PROT_EXEC) = 0
mprotect(0xb6b33000, 4096, PROT_READ)   = 0
mprotect(0xb6c42000, 4096, PROT_READ)   = 0
mprotect(0xb6c7e000, 4096, PROT_READ)   = 0
mprotect(0xb6c95000, 3399680, PROT_READ|PROT_WRITE) = 0
mprotect(0xb6c95000, 3399680, PROT_READ|PROT_EXEC) = 0
mprotect(0xb6fd3000, 28672, PROT_READ)  = 0
mprotect(0xb7147000, 8192, PROT_READ)   = 0
mprotect(0xb728d000, 32768, PROT_READ|PROT_WRITE) = 0
mprotect(0xb728d000, 32768, PROT_READ|PROT_EXEC) = 0
mprotect(0xb7295000, 4096, PROT_READ)   = 0
mprotect(0xb72a8000, 4096, PROT_READ)   = 0
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV +++
Comment 1 GNUtoo 2008-05-24 14:26:10 UTC
Created attachment 16270 [details]
my .config of the kenrel

note that some security features added at the 2.6.25 are present
mabe that could be the cause of the problem
Comment 2 GNUtoo 2008-05-24 14:27:46 UTC
i have glibc-2.6.1
Comment 3 GNUtoo 2008-05-24 14:46:42 UTC
Created attachment 16271 [details]
009-allow-ap-vlan-modes.patch

ah yes...i have forgetten...i've added a patch...i attached it here...
Comment 4 Andrew Morton 2008-05-24 15:34:13 UTC
Are there any interesting messages in the kernel logs?  /bin/dmesg?
Comment 5 GNUtoo 2008-05-24 16:07:05 UTC
unfortunately no...
i can attach my whole dmesg if you want...
should i recompile my kernel with some debug things?
Comment 6 GNUtoo 2008-05-24 16:10:38 UTC
Created attachment 16272 [details]
2.6.26-rc3-dmesg

as there are memory related things at the beginning i added the whole dmesg
Comment 7 GNUtoo 2008-05-24 16:25:57 UTC
2.6.23-rc2 has also the same bug
Comment 8 GNUtoo 2008-05-24 16:28:37 UTC
oops i booted the wrong kernel...don't know why
Comment 9 GNUtoo 2008-05-24 16:34:23 UTC
2.6.23-rc2 is fine...so the bug was introduced between rc2 and rc3
Comment 10 Adrian Bunk 2008-05-25 10:39:48 UTC
Does an unpatches 2.6.26-rc3 kernel show the same problem?

If yes, does enabling CONFIG_X86_PAT help?
Comment 11 GNUtoo 2008-05-25 12:21:48 UTC
unpatched does the same thing...
Comment 12 GNUtoo 2008-05-25 12:32:07 UTC
enabling CONFIG_X86_PAT doesn't change anything...
Comment 13 GNUtoo 2008-05-25 12:32:37 UTC
oops it was the trace of mplayer not vlc...
Comment 14 GNUtoo 2008-05-25 12:34:05 UTC
wow...this time i have a message in dmesg:
vlc[6546]: segfault at b02299dc ip b02299dc sp b2153bec error 15 in libxvidcore.so.4.1[b0220000+99000]
Comment 15 Andrew Morton 2008-05-25 13:22:21 UTC
Well I don't know what could have caused this and afaik nobody
else is hitting it.

So I'm afraid I'll have to ask if you are able to perform the
dreaded git bisection search.  It will take a couple of hours.


http://www.kernel.org/doc/local/git-quick.html has the instructions.

Thanks.
Comment 16 Adrian Bunk 2008-05-25 13:27:47 UTC
What hardware is your computer (especially which CPU)?
Comment 17 Adrian Bunk 2008-05-25 13:29:57 UTC
Created attachment 16276 [details]
Can you try whether _reverting_ this patch fixes it?
Comment 18 GNUtoo 2008-05-25 14:17:13 UTC
(In reply to comment #16)
> What hardware is your computer (especially which CPU)?
> 

$ cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 13
model name      : Intel(R) Pentium(R) M processor 2.00GHz
stepping        : 8
cpu MHz         : 2000.000
cache size      : 2048 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush dts acpi mmx fxsr sse sse2 ss tm pbe nx bts est tm2
bogomips        : 3992.51
clflush size    : 64
Comment 19 GNUtoo 2008-05-25 14:17:54 UTC
my computer is a sony vaio laptop(VGN-BX297XP)
Comment 20 GNUtoo 2008-05-25 14:25:55 UTC
tremulous and nexuiz also segfault...
i'll revert the patch...
do someone need more strace because with vlc they are different
Comment 21 GNUtoo 2008-05-25 14:53:13 UTC
(In reply to comment #17)
> Created an attachment (id=16276) [details]
> Can you try whether _reverting_ this patch fixes it?
> 

with the revert of this patch and CONFIG_X86_PAT it works again as expected(the bug disappeared)
Comment 22 Adrian Bunk 2008-05-25 14:57:27 UTC
Thanks for testing!

Caused by:

commit 1c12c4cf9411eb130b245fa8d0fbbaf989477c7b
Author: Venki Pallipadi <venkatesh.pallipadi@intel.com>
Date:   Wed May 14 16:05:51 2008 -0700

    mprotect: prevent alteration of the PAT bits
    
    There is a defect in mprotect, which lets the user change the page cache
    type bits by-passing the kernel reserve_memtype and free_memtype
    wrappers.  Fix the problem by not letting mprotect change the PAT bits.
    
    Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
    Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Signed-off-by: Hugh Dickins <hugh@veritas.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Comment 23 Adrian Bunk 2008-05-25 14:59:56 UTC
Does reverting the patch and disabling CONFIG_X86_PAT again also fix the problem?
Comment 24 GNUtoo 2008-05-25 15:24:35 UTC
(In reply to comment #23)
> Does reverting the patch and disabling CONFIG_X86_PAT again also fix the
> problem?
> 
yes it also fixes the problem
Comment 25 Hugh Dickins 2008-05-26 01:42:03 UTC
I'm fairly sure you'll find these segfaults were fixed in 2.6.26-rc3-git2
and so in current git: please retest with a recent kernel when you can.

The PAT pte_modify() changes fell foul of a misdefined PTE_MASK when NX
is used: see http://lkml.org/lkml/2008/5/19/446 for a link into the
mailthread; but Linus rightly put in Jeremy Fitzhardinge's proper
PTE_MASK fixes instead of the two-line hack in that mail.
Comment 26 Adrian Bunk 2008-05-26 01:43:11 UTC
Short summary of the discussion in this bug:

Regression introduced between 2.6.26-rc2 and 2.6.26-rc3.

Caused by commit 1c12c4cf9411eb130b245fa8d0fbbaf989477c7b (mprotect: prevent alteration of the PAT bits).

It does *not* matter whether CONFIG_X86_PAT is enabled or disabled in the kernel - according to the submitter it breaks in both cases and reverting the commit fixes it in both cases.
Comment 27 Adrian Bunk 2008-05-26 01:48:47 UTC
(In reply to comment #25)
> I'm fairly sure you'll find these segfaults were fixed in 2.6.26-rc3-git2
> and so in current git: please retest with a recent kernel when you can.
> 
> The PAT pte_modify() changes fell foul of a misdefined PTE_MASK when NX
> is used: see http://lkml.org/lkml/2008/5/19/446 for a link into the
> mailthread; but Linus rightly put in Jeremy Fitzhardinge's proper
> PTE_MASK fixes instead of the two-line hack in that mail.

Your comment came after I started writing my comment.

Just to avoid misunderstandings, since my comment was not meant as a reply to your comment.
Comment 28 Linus Torvalds 2008-05-26 10:12:24 UTC
This is the known PAE bug. It's fixed in current -git by commit 2bd3a99c9d1851182f73d0a024dc5bdb0a470e8c ("x86: define PTE_MASK in a universally useful way") and will be in -rc4 which is planned for later today.
Comment 29 Adrian Bunk 2008-05-30 00:55:53 UTC
As described in the bug it should already be fixed.

Please reopen this bug if it's still present with kernel 2.6.26-rc4.