Bug 10412
Summary: | Memory corruption a_ops or 'mapping' pointer | ||
---|---|---|---|
Product: | Memory Management | Reporter: | Plamen Petrov (plamen.sisi) |
Component: | Other | Assignee: | Jan Kara (jack) |
Status: | CLOSED INVALID | ||
Severity: | blocking | CC: | jack |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | linux 2.6.25-rc8-00166-g6fdf5e6 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 9832 | ||
Attachments: |
gzipped config used
dmesg output lspci -vv output |
Description
Plamen Petrov
2008-04-07 02:33:04 UTC
Created attachment 15644 [details]
gzipped config used
This is the kernel config used, copied directly from /proc.
Created attachment 15645 [details]
dmesg output
output of dmesg
Created attachment 15646 [details]
lspci -vv output
lspci -vv output
First report reference: http://lkml.org/lkml/2008/4/7/36 This entry is being used for tracking a regression from 2.6.24. Please don't close it until the problem is fixed in the mainline. Can you identify the piece of code corresponding to set_page_dirty+0x34, please? Please keep the bugme-daemon address on the CC list, so that the bug tracker
can pick up your replies automatically.
On Monday, 7 of April 2008, Plamen Petrov wrote:
> bugme-daemon@bugzilla.kernel.org wrote:
> > http://bugzilla.kernel.org/show_bug.cgi?id=10412
> >
> > ------- Comment #6 from rjw@sisk.pl 2008-04-07 03:25 -------
> > Can you identify the piece of code corresponding to set_page_dirty+0x34,
> > please?
> >
> Sorry, I'm not sure what exactly you want me to do...
> And, frankly, I'm not sure if I am even capable of doing it...
You can use gdb for this purpose. Please go to the directory where you have
compiled the kernel and rung "gdb vmlinux". Then, under gdb, execute
"l *set_page_dirty+0x34" and it should show you which line of code this address
corresponds to.
This will work if your kernel has been compiled with CONFIG_DEBUG_INFO set.
Rafael J. Wysocki wrote: > Please keep the bugme-daemon address on the CC list, so that the bug tracker > can pick up your replies automatically. > Will do, sorry. > On Monday, 7 of April 2008, Plamen Petrov wrote: >> bugme-daemon@bugzilla.kernel.org wrote: >>> http://bugzilla.kernel.org/show_bug.cgi?id=10412 >>> >>> ------- Comment #6 from rjw@sisk.pl 2008-04-07 03:25 ------- >>> Can you identify the piece of code corresponding to set_page_dirty+0x34, >>> please? >>> >> Sorry, I'm not sure what exactly you want me to do... >> And, frankly, I'm not sure if I am even capable of doing it... > > You can use gdb for this purpose. Please go to the directory where you have > compiled the kernel and rung "gdb vmlinux". Then, under gdb, execute > "l *set_page_dirty+0x34" and it should show you which line of code this > address > corresponds to. > > This will work if your kernel has been compiled with CONFIG_DEBUG_INFO set. Unfortunately, I will be away from the machine for at least the next 6 hours or so - the debug will have to wait. Just checked - excerpt from my kernel config: ... CONFIG_DEBUG_BUGVERBOSE=y # CONFIG_DEBUG_INFO is not set # CONFIG_DEBUG_VM is not set ... If I re-compile the same kernel with CONFIG_DEBUG_INFO set, will it do? Or should I go to latest git available, and enable CONFIG_DEBUG_INFO? Thanks, The piece of code that oopses is (use "linux/scripts/decodecode" to see it from the oops): 12: 48 8b 80 b8 00 00 00 mov 0xb8(%rax),%rax 19: 48 c7 c2 d0 08 2c 80 mov $0xffffffff802c08d0,%rdx 20: 48 8b 40 20 mov 0x20(%rax),%rax 24: 48 85 c0 test %rax,%rax 27: 48 0f 44 c2 cmove %rdx,%rax ** 0: ff d0 callq *%rax <--------- THIS 2: 85 c0 test %eax,%eax 4: 89 c5 mov %eax,%ebp 6: 74 21 je 0x29 and it gets a GP fault on the "callq" to never-never-land due to EAX being corrupt (RAX: 6b636f6c5f747369). That RAX value is a string ("ist_lock"), not a pointer. The code itself comes from int (*spd)(struct page *) = mapping->a_ops->set_page_dirty; if (!spd) spd = __set_page_dirty_buffers; return (*spd)(page); (in __set_page_dirty() - inlined), so it looks like "mapping" or "a_ops" was corrupted. The "Scheduling while atomic" part is uninteresting. It's just a result of this earlier oops (we killed the process while it was holding locks). Plamen, could you try running memtest (to rule out possibility of a single-bit error in the pointer) and if it doesn't find anything, try enabling CONFIG_DEBUG_SLAB and to catch memory corruption. bugme-daemon@bugzilla.kernel.org wrote: > Plamen, could you try running memtest (to rule out possibility of a > single-bit > error in the pointer) and if it doesn't find anything, try enabling > CONFIG_DEBUG_SLAB and to catch memory corruption. > > Well, I must admit I only ran memtest when I assembled that machine six months or so ago... Anyway, yesturday I changed my motherboard - the new one is a GA-P35-DS3R from Gigabyte. What I can do is enable CONFIG_DEBUG_SLAB, but with this new mobo - I have no problems at all. I'll keep you informed if anything comes up... OK, I'll close the bug as invalid for now, please reopen in case you see the same problem again. I'm also changing the subject of the bug to match more the real problem... |