Bug 5238
Summary: | crash during/after pvmove | ||
---|---|---|---|
Product: | IO/Storage | Reporter: | Hans-Peter Bock (xbk) |
Component: | LVM2/DM | Assignee: | Alasdair G Kergon (agk) |
Status: | CLOSED INSUFFICIENT_DATA | ||
Severity: | blocking | CC: | bunk, diegocg |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.13, 2.6.12y (and 2.6.10 also) | Subsystem: | |
Regression: | --- | Bisected commit-id: |
Description
Hans-Peter Bock
2005-09-13 00:11:35 UTC
bugme-daemon@kernel-bugs.osdl.org wrote: > > Unable to handle kernel paging request at virtual address e08c00a4 I'd be suspecting a flipped bit in memory: the kernel wanted 0xc08c00a4 there. Is it reproducible? How good is the hardware? Suggest you run memtest86 on it for 24 hours. Running memtest is difficult, since the machine is our institutes fileserver. I'll try to run it over the weekend when most employees are not working and report the results then. I have run memtest86 yesterday for 13.5 hours. It did not detect any errors in the memory system. The system also had uptimes up to 200 days without any errors before (not more than 200 due to maintenance or kernel upgrades). Today I had another crash during pvmove on a different computer: ksymoops 2.4.9 on i686 2.6.13. Options used -V (default) -k /proc/ksyms (default) -l /proc/modules (default) -o /lib/modules/2.6.13/ (default) -m /boot/System.map-2.6.13 (default) Warning: You did not tell me where to find symbol information. I will assume that the log matches the kernel and modules that are running right now and I'll use the default options above for symbol resolution. If the current kernel and/or modules do not match the log, you can get more accurate output by telling me the kernel version and where to find map, modules, ksyms etc. ksymoops -h explains the options. Error (regular_file): read_ksyms stat /proc/ksyms failed No modules in ksyms, skipping objects No ksyms, skipping lsmod Machine check exception polling timer started. 0000:00:03.0: Realtek RTL8201 PHY transceiver found at address 1. 0000:00:03.0: Using transceiver found at address 1 as default ehci_hcd 0000:00:0d.3: debug port 1 Unable to handle kernel paging request at virtual address f8ebd000 f8ea9a6e *pde = 3744f067 Oops: 0000 [#1] CPU: 0 EIP: 0060:[<f8ea9a6e>] Not tainted VLI Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 00010246 (2.6.13) eax: f8ebc000 ebx: f6ffa78c ecx: f8eae120 edx: 00008000 esi: 00008000 edi: 00000000 ebp: e58abf3c esp: e58abee0 ds: 007b es: 007b ss: 0068 Stack: f8eaa20a d0981c20 00008000 00000001 d0a15860 f6ffa78c f8eaad45 f6ffa78c 00008000 00000001 00000000 00000000 00000000 00000000 00000000 00000000 f6ffa780 f8eae1e8 00000000 f8eaaf40 f8eaaf33 f6ffa780 e58abf3c 00000000 Call Trace: [<f8eaa20a>] rh_state+0x3a/0x60 [dm_mirror] [<f8eaad45>] do_writes+0x85/0x200 [dm_mirror] [<f8eaaf40>] do_work+0x0/0x60 [dm_mirror] [<f8eaaf33>] do_mirror+0x73/0x80 [dm_mirror] [<f8eaaf69>] do_work+0x29/0x60 [dm_mirror] [<c012e671>] worker_thread+0x1b1/0x260 [<c011baf0>] default_wake_function+0x0/0x20 [<c012e4c0>] worker_thread+0x0/0x260 [<c01324e6>] kthread+0xb6/0xc0 [<c0132430>] kthread+0x0/0xc0 [<c0101369>] kernel_thread_helper+0x5/0xc Code: 04 8b 54 24 08 8b 40 04 8b 40 18 0f a3 10 19 d2 31 c0 85 d2 0f 95 c0 c3 90 8d 74 26 00 8b 44 24 04 8b 54 24 08 8b 40 04 8b 40 1c <0f> a3 10 19 d2 31 c0 85 d2 0f 95 c0 c3 90 8d 74 26 00 31 c0 c3 >>EIP; f8ea9a6e <pg0+38a86a6e/3fbdb400> <===== >>eax; f8ebc000 <pg0+38a99000/3fbdb400> >>ebx; f6ffa78c <pg0+36bd778c/3fbdb400> >>ecx; f8eae120 <pg0+38a8b120/3fbdb400> >>ebp; e58abf3c <pg0+25488f3c/3fbdb400> >>esp; e58abee0 <pg0+25488ee0/3fbdb400> Trace; f8eaa20a <pg0+38a8720a/3fbdb400> Trace; f8eaad45 <pg0+38a87d45/3fbdb400> Trace; f8eaaf40 <pg0+38a87f40/3fbdb400> Trace; f8eaaf33 <pg0+38a87f33/3fbdb400> Trace; f8eaaf69 <pg0+38a87f69/3fbdb400> Trace; c012e671 <worker_thread+1b1/260> Trace; c011baf0 <default_wake_function+0/20> Trace; c012e4c0 <worker_thread+0/260> Trace; c01324e6 <kthread+b6/c0> Trace; c0132430 <kthread+0/c0> Trace; c0101369 <kernel_thread_helper+5/c> This architecture has variable length instructions, decoding before eip is unreliable, take these instructions with a pinch of salt. Code; f8ea9a43 <pg0+38a86a43/3fbdb400> 00000000 <_EIP>: Code; f8ea9a43 <pg0+38a86a43/3fbdb400> 0: 04 8b add $0x8b,%al Code; f8ea9a45 <pg0+38a86a45/3fbdb400> 2: 54 push %esp Code; f8ea9a46 <pg0+38a86a46/3fbdb400> 3: 24 08 and $0x8,%al Code; f8ea9a48 <pg0+38a86a48/3fbdb400> 5: 8b 40 04 mov 0x4(%eax),%eax Code; f8ea9a4b <pg0+38a86a4b/3fbdb400> 8: 8b 40 18 mov 0x18(%eax),%eax Code; f8ea9a4e <pg0+38a86a4e/3fbdb400> b: 0f a3 10 bt %edx,(%eax) Code; f8ea9a51 <pg0+38a86a51/3fbdb400> e: 19 d2 sbb %edx,%edx Code; f8ea9a53 <pg0+38a86a53/3fbdb400> 10: 31 c0 xor %eax,%eax Code; f8ea9a55 <pg0+38a86a55/3fbdb400> 12: 85 d2 test %edx,%edx Code; f8ea9a57 <pg0+38a86a57/3fbdb400> 14: 0f 95 c0 setne %al Code; f8ea9a5a <pg0+38a86a5a/3fbdb400> 17: c3 ret Code; f8ea9a5b <pg0+38a86a5b/3fbdb400> 18: 90 nop Code; f8ea9a5c <pg0+38a86a5c/3fbdb400> 19: 8d 74 26 00 lea 0x0(%esi),%esi Code; f8ea9a60 <pg0+38a86a60/3fbdb400> 1d: 8b 44 24 04 mov 0x4(%esp),%eax Code; f8ea9a64 <pg0+38a86a64/3fbdb400> 21: 8b 54 24 08 mov 0x8(%esp),%edx Code; f8ea9a68 <pg0+38a86a68/3fbdb400> 25: 8b 40 04 mov 0x4(%eax),%eax Code; f8ea9a6b <pg0+38a86a6b/3fbdb400> 28: 8b 40 1c mov 0x1c(%eax),%eax This decode from eip onwards should be reliable Code; f8ea9a6e <pg0+38a86a6e/3fbdb400> 00000000 <_EIP>: Code; f8ea9a6e <pg0+38a86a6e/3fbdb400> <===== 0: 0f a3 10 bt %edx,(%eax) <===== Code; f8ea9a71 <pg0+38a86a71/3fbdb400> 3: 19 d2 sbb %edx,%edx Code; f8ea9a73 <pg0+38a86a73/3fbdb400> 5: 31 c0 xor %eax,%eax Code; f8ea9a75 <pg0+38a86a75/3fbdb400> 7: 85 d2 test %edx,%edx Code; f8ea9a77 <pg0+38a86a77/3fbdb400> 9: 0f 95 c0 setne %al Code; f8ea9a7a <pg0+38a86a7a/3fbdb400> c: c3 ret Code; f8ea9a7b <pg0+38a86a7b/3fbdb400> d: 90 nop Code; f8ea9a7c <pg0+38a86a7c/3fbdb400> e: 8d 74 26 00 lea 0x0(%esi),%esi Code; f8ea9a80 <pg0+38a86a80/3fbdb400> 12: 31 c0 xor %eax,%eax Code; f8ea9a82 <pg0+38a86a82/3fbdb400> 14: c3 ret 1 warning and 1 error issued. Results may not be reliable. Have it got better in latest kernel versions? Please reopen this bug if it's still present in kernel 2.6.19-rc6. |