Most recent kernel where this bug did not occur: - Distribution: Debian Sarge Hardware Environment: total used free shared buffers cached Mem: 514148 153444 360704 0 10728 82708 -/+ buffers/cache: 60008 454140 Swap: 577784 0 577784 0000:00:00.0 Host bridge: Intel Corp. 82865G/PE/P DRAM Controller/Host-Hub Interface (rev 02) 0000:00:01.0 PCI bridge: Intel Corp. 82865G/PE/P PCI to AGP Controller (rev 02) 0000:00:1d.0 USB Controller: Intel Corp. 82801EB/ER (ICH5/ICH5R) USB UHCI #1 (rev 02) 0000:00:1d.1 USB Controller: Intel Corp. 82801EB/ER (ICH5/ICH5R) USB UHCI #2 (rev 02) 0000:00:1d.2 USB Controller: Intel Corp. 82801EB/ER (ICH5/ICH5R) USB UHCI #3 (rev 02) 0000:00:1d.3 USB Controller: Intel Corp. 82801EB/ER (ICH5/ICH5R) USB UHCI #4 (rev 02) 0000:00:1d.7 USB Controller: Intel Corp. 82801EB/ER (ICH5/ICH5R) USB2 EHCI Controller (rev 02) 0000:00:1e.0 PCI bridge: Intel Corp. 82801 PCI Bridge (rev c2) 0000:00:1f.0 ISA bridge: Intel Corp. 82801EB/ER (ICH5/ICH5R) LPC Bridge (rev 02) 0000:00:1f.1 IDE interface: Intel Corp. 82801EB/ER (ICH5/ICH5R) Ultra ATA 100 Storage Controller (rev 02) 0000:00:1f.2 IDE interface: Intel Corp. 82801EB (ICH5) Serial ATA 150 Storage Controller (rev 02) 0000:00:1f.3 SMBus: Intel Corp. 82801EB/ER (ICH5/ICH5R) SMBus Controller (rev 02) 0000:00:1f.5 Multimedia audio controller: Intel Corp. 82801EB/ER (ICH5/ICH5R) AC'97 Audio Controller (rev 02) 0000:02:03.0 FireWire (IEEE 1394): VIA Technologies, Inc. IEEE 1394 Host Controller (rev 80) 0000:02:04.0 RAID bus controller: VIA Technologies, Inc. VT6410 ATA133 RAID controller (rev 06) 0000:02:05.0 Ethernet controller: 3Com Corporation 3c940 10/100/1000Base-T [Marvell] (rev 12) 0000:02:09.0 VGA compatible controller: ATI Technologies Inc 3D Rage I/II 215GT [Mach64 GT] (rev 9a) 0000:02:0a.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 0c) 0000:02:0b.0 RAID bus controller: Promise Technology, Inc. PDC20319 (FastTrak S150 TX4) (rev 02) 0000:02:0c.0 RAID bus controller: Promise Technology, Inc. PDC20319 (FastTrak S150 TX4) (rev 02) Software Environment: ii lvm2 2.01.04-5 The Linux Logical Volume Manager Problem Description: kernel crashed after "pvmove -i 5 /dev/md2" showed 100%. The lvm2 completely blocked all file i/o. After a hard reset and a "pvmove --abort" everything looks like before: Finding all volume groups Finding volume group "iswfs3" --- Volume group --- VG Name iswfs3 System ID Format lvm2 Metadata Areas 6 Metadata Sequence No 54 VG Access read/write VG Status resizable MAX LV 255 Cur LV 6 Open LV 6 Max PV 255 Cur PV 6 Act PV 6 VG Size 293,62 GB PE Size 64,00 MB Total PE 4698 Alloc PE / Size 3642 / 227,62 GB Free PE / Size 1056 / 66,00 GB VG UUID jD2hL8-rAOI-biZ8-XG7W-GyFJ-blxg-olvZ7q --- Logical volume --- LV Name /dev/iswfs3/export VG Name iswfs3 LV UUID trFtew-4oR5-RqvT-RJ5E-1pEp-HVsC-60xfLp LV Write Access read/write LV Status available # open 1 LV Size 140,00 GB Current LE 2240 Segments 4 Allocation inherit Read ahead sectors 0 Block device 253:0 [...] --- Physical volumes --- PV Name /dev/md2 PV UUID HjkSgb-aaf4-Tikp-sOhz-Db2A-ncS8-WGitfJ PV Status allocatable Total PE / Free PE 783 / 576 PV Name /dev/md3 PV UUID mp6T4e-wt15-wotO-0Jgx-0fkP-N2aa-0pMdhG PV Status allocatable Total PE / Free PE 783 / 0 PV Name /dev/md4 PV UUID OcYRYg-NWul-vjQa-FL0B-gAwz-FeRZ-kLBwNs PV Status allocatable Total PE / Free PE 783 / 262 PV Name /dev/md5 PV UUID y7DwIp-Oyq7-q30o-ovuU-S7fV-In4i-d9Yt5q PV Status allocatable Total PE / Free PE 783 / 0 PV Name /dev/md6 PV UUID MydNE6-rAzW-25KO-MJbO-awvb-zA3o-f8bmU5 PV Status allocatable Total PE / Free PE 783 / 0 PV Name /dev/md7 PV UUID GlydQL-aO0D-sbsE-JieB-udp8-B4z5-Y7aCcb PV Status allocatable Total PE / Free PE 783 / 218 Steps to reproduce: pvmove -i 5 /dev/md2 Kernel-Message: Unable to handle kernel paging request at virtual address e08c00a4 printing eip: c031fffe *pde = 014db067 *pte = 00000000 Oops: 0000 [#1] CPU: 0 EIP: 0060:[<c031fffe>] Not tainted VLI EFLAGS: 00010246 (2.6.12y) EIP is at core_in_sync+0xe/0x20 eax: 00008527 ebx: de7ca480 ecx: c0460180 edx: e08bf000 esi: c07cfe00 edi: 00000000 ebp: 00000000 esp: ccbcf7ec ds: 007b es: 007b ss: 0068 Process smbd (pid: 9956, threadinfo=ccbce000 task=c5ab6020) Stack: c0321e46 d11692a0 00008527 00000000 00000000 c0e7ceac c07cfe00 c0e7ceac ccbcf8a4 c03140d9 e08880b8 c07cfe00 c0e7ceb4 c07cf980 c07cf980 c0e7ceac c03143cb e08880b8 c07cfe00 c0e7ceac 00000000 00000002 00000010 c5ab6020 Call Trace: [<c0321e46>] mirror_map+0x76/0x100 [<c03140d9>] __map_bio+0x49/0x120 [<c03143cb>] __clone_and_map+0xfb/0x3a0 [<c012b540>] autoremove_wake_function+0x0/0x60 [<c012b540>] autoremove_wake_function+0x0/0x60 [<c0314738>] __split_bio+0xc8/0x150 [<c012b540>] autoremove_wake_function+0x0/0x60 [<c031483a>] dm_request+0x7a/0xb0 [<c0297afd>] generic_make_request+0x17d/0x220 [<c012b540>] autoremove_wake_function+0x0/0x60 [<c012b540>] autoremove_wake_function+0x0/0x60 [<c03140d9>] __map_bio+0x49/0x120 [<c03143cb>] __clone_and_map+0xfb/0x3a0 [<c012b540>] autoremove_wake_function+0x0/0x60 [<c012b540>] autoremove_wake_function+0x0/0x60 [<c0314738>] __split_bio+0xc8/0x150 [<c031483a>] dm_request+0x7a/0xb0 [<c0136810>] prep_new_page+0x60/0x70 [<c0297afd>] generic_make_request+0x17d/0x220 [<c02c90a8>] ata_check_status+0x28/0x30 [<c012b540>] autoremove_wake_function+0x0/0x60 [<c02cc043>] ata_bmdma_setup_pio+0x43/0x60 [<c012b540>] autoremove_wake_function+0x0/0x60 [<c012b540>] autoremove_wake_function+0x0/0x60 [<c02cbe01>] ata_qc_issue+0x61/0xa0 [<c0297c02>] submit_bio+0x62/0x100 [<c0153b40>] bio_alloc_bioset+0xf0/0x1f0 [<c0153fe4>] bio_add_page+0x34/0x40 [<c02228c7>] _pagebuf_ioapply+0x217/0x2e0 [<c0222a18>] pagebuf_iorequest+0x88/0x170 [<c0221cbe>] xfs_buf_get_flags+0xce/0x140 [<c0222538>] pagebuf_iostart+0x88/0xc0 [<c0221dc6>] xfs_buf_read_flags+0x96/0xa0 [<c021365f>] xfs_trans_read_buf+0x2ef/0x350 [<c01f9dd9>] xfs_itobp+0x89/0x260 [<c021f705>] kmem_zone_zalloc+0x35/0x60 [<c01fb2e8>] xfs_iread+0x78/0x220 [<c022930d>] linvfs_alloc_inode+0x2d/0x40 [<c01f8a20>] xfs_iget_core+0xb0/0x520 [<c01f8fc5>] xfs_iget+0x135/0x190 [<c0214812>] xfs_dir_lookup_int+0xb2/0x130 [<c021a010>] xfs_lookup+0x50/0x90 [<c0226c22>] linvfs_lookup+0x52/0x90 [<c015bcc1>] real_lookup+0xc1/0xf0 [<c015bffd>] do_lookup+0x9d/0xb0 [<c015c62c>] __link_path_walk+0x61c/0xbf0 [<c01339b7>] filemap_nopage+0x1d7/0x3a0 [<c015cc47>] link_path_walk+0x47/0xe0 [<c015cf31>] path_lookup+0x71/0x120 [<c015d1a3>] __user_walk+0x33/0x60 [<c0157eaf>] vfs_stat+0x1f/0x60 [<c01585fb>] sys_stat64+0x1b/0x40 [<c0104d04>] math_state_restore+0x24/0x40 [<c0102849>] syscall_call+0x7/0xb Code: 04 8b 40 04 8b 50 1c 8b 44 24 08 0f a3 02 19 c0 85 c0 0f 95 c0 0f b6 c0 c3 8d 74 26 00 8b 44 24 04 8b 40 04 8b 50 20 8b 44 24 08 <0f> a3 02 19 c0 85 c0 0f 95 c0 0f b6 c0 c3 8d 74 26 00 31 c0 c3
bugme-daemon@kernel-bugs.osdl.org wrote: > > Unable to handle kernel paging request at virtual address e08c00a4 I'd be suspecting a flipped bit in memory: the kernel wanted 0xc08c00a4 there. Is it reproducible? How good is the hardware? Suggest you run memtest86 on it for 24 hours.
Running memtest is difficult, since the machine is our institutes fileserver. I'll try to run it over the weekend when most employees are not working and report the results then.
I have run memtest86 yesterday for 13.5 hours. It did not detect any errors in the memory system. The system also had uptimes up to 200 days without any errors before (not more than 200 due to maintenance or kernel upgrades).
Today I had another crash during pvmove on a different computer: ksymoops 2.4.9 on i686 2.6.13. Options used -V (default) -k /proc/ksyms (default) -l /proc/modules (default) -o /lib/modules/2.6.13/ (default) -m /boot/System.map-2.6.13 (default) Warning: You did not tell me where to find symbol information. I will assume that the log matches the kernel and modules that are running right now and I'll use the default options above for symbol resolution. If the current kernel and/or modules do not match the log, you can get more accurate output by telling me the kernel version and where to find map, modules, ksyms etc. ksymoops -h explains the options. Error (regular_file): read_ksyms stat /proc/ksyms failed No modules in ksyms, skipping objects No ksyms, skipping lsmod Machine check exception polling timer started. 0000:00:03.0: Realtek RTL8201 PHY transceiver found at address 1. 0000:00:03.0: Using transceiver found at address 1 as default ehci_hcd 0000:00:0d.3: debug port 1 Unable to handle kernel paging request at virtual address f8ebd000 f8ea9a6e *pde = 3744f067 Oops: 0000 [#1] CPU: 0 EIP: 0060:[<f8ea9a6e>] Not tainted VLI Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 00010246 (2.6.13) eax: f8ebc000 ebx: f6ffa78c ecx: f8eae120 edx: 00008000 esi: 00008000 edi: 00000000 ebp: e58abf3c esp: e58abee0 ds: 007b es: 007b ss: 0068 Stack: f8eaa20a d0981c20 00008000 00000001 d0a15860 f6ffa78c f8eaad45 f6ffa78c 00008000 00000001 00000000 00000000 00000000 00000000 00000000 00000000 f6ffa780 f8eae1e8 00000000 f8eaaf40 f8eaaf33 f6ffa780 e58abf3c 00000000 Call Trace: [<f8eaa20a>] rh_state+0x3a/0x60 [dm_mirror] [<f8eaad45>] do_writes+0x85/0x200 [dm_mirror] [<f8eaaf40>] do_work+0x0/0x60 [dm_mirror] [<f8eaaf33>] do_mirror+0x73/0x80 [dm_mirror] [<f8eaaf69>] do_work+0x29/0x60 [dm_mirror] [<c012e671>] worker_thread+0x1b1/0x260 [<c011baf0>] default_wake_function+0x0/0x20 [<c012e4c0>] worker_thread+0x0/0x260 [<c01324e6>] kthread+0xb6/0xc0 [<c0132430>] kthread+0x0/0xc0 [<c0101369>] kernel_thread_helper+0x5/0xc Code: 04 8b 54 24 08 8b 40 04 8b 40 18 0f a3 10 19 d2 31 c0 85 d2 0f 95 c0 c3 90 8d 74 26 00 8b 44 24 04 8b 54 24 08 8b 40 04 8b 40 1c <0f> a3 10 19 d2 31 c0 85 d2 0f 95 c0 c3 90 8d 74 26 00 31 c0 c3 >>EIP; f8ea9a6e <pg0+38a86a6e/3fbdb400> <===== >>eax; f8ebc000 <pg0+38a99000/3fbdb400> >>ebx; f6ffa78c <pg0+36bd778c/3fbdb400> >>ecx; f8eae120 <pg0+38a8b120/3fbdb400> >>ebp; e58abf3c <pg0+25488f3c/3fbdb400> >>esp; e58abee0 <pg0+25488ee0/3fbdb400> Trace; f8eaa20a <pg0+38a8720a/3fbdb400> Trace; f8eaad45 <pg0+38a87d45/3fbdb400> Trace; f8eaaf40 <pg0+38a87f40/3fbdb400> Trace; f8eaaf33 <pg0+38a87f33/3fbdb400> Trace; f8eaaf69 <pg0+38a87f69/3fbdb400> Trace; c012e671 <worker_thread+1b1/260> Trace; c011baf0 <default_wake_function+0/20> Trace; c012e4c0 <worker_thread+0/260> Trace; c01324e6 <kthread+b6/c0> Trace; c0132430 <kthread+0/c0> Trace; c0101369 <kernel_thread_helper+5/c> This architecture has variable length instructions, decoding before eip is unreliable, take these instructions with a pinch of salt. Code; f8ea9a43 <pg0+38a86a43/3fbdb400> 00000000 <_EIP>: Code; f8ea9a43 <pg0+38a86a43/3fbdb400> 0: 04 8b add $0x8b,%al Code; f8ea9a45 <pg0+38a86a45/3fbdb400> 2: 54 push %esp Code; f8ea9a46 <pg0+38a86a46/3fbdb400> 3: 24 08 and $0x8,%al Code; f8ea9a48 <pg0+38a86a48/3fbdb400> 5: 8b 40 04 mov 0x4(%eax),%eax Code; f8ea9a4b <pg0+38a86a4b/3fbdb400> 8: 8b 40 18 mov 0x18(%eax),%eax Code; f8ea9a4e <pg0+38a86a4e/3fbdb400> b: 0f a3 10 bt %edx,(%eax) Code; f8ea9a51 <pg0+38a86a51/3fbdb400> e: 19 d2 sbb %edx,%edx Code; f8ea9a53 <pg0+38a86a53/3fbdb400> 10: 31 c0 xor %eax,%eax Code; f8ea9a55 <pg0+38a86a55/3fbdb400> 12: 85 d2 test %edx,%edx Code; f8ea9a57 <pg0+38a86a57/3fbdb400> 14: 0f 95 c0 setne %al Code; f8ea9a5a <pg0+38a86a5a/3fbdb400> 17: c3 ret Code; f8ea9a5b <pg0+38a86a5b/3fbdb400> 18: 90 nop Code; f8ea9a5c <pg0+38a86a5c/3fbdb400> 19: 8d 74 26 00 lea 0x0(%esi),%esi Code; f8ea9a60 <pg0+38a86a60/3fbdb400> 1d: 8b 44 24 04 mov 0x4(%esp),%eax Code; f8ea9a64 <pg0+38a86a64/3fbdb400> 21: 8b 54 24 08 mov 0x8(%esp),%edx Code; f8ea9a68 <pg0+38a86a68/3fbdb400> 25: 8b 40 04 mov 0x4(%eax),%eax Code; f8ea9a6b <pg0+38a86a6b/3fbdb400> 28: 8b 40 1c mov 0x1c(%eax),%eax This decode from eip onwards should be reliable Code; f8ea9a6e <pg0+38a86a6e/3fbdb400> 00000000 <_EIP>: Code; f8ea9a6e <pg0+38a86a6e/3fbdb400> <===== 0: 0f a3 10 bt %edx,(%eax) <===== Code; f8ea9a71 <pg0+38a86a71/3fbdb400> 3: 19 d2 sbb %edx,%edx Code; f8ea9a73 <pg0+38a86a73/3fbdb400> 5: 31 c0 xor %eax,%eax Code; f8ea9a75 <pg0+38a86a75/3fbdb400> 7: 85 d2 test %edx,%edx Code; f8ea9a77 <pg0+38a86a77/3fbdb400> 9: 0f 95 c0 setne %al Code; f8ea9a7a <pg0+38a86a7a/3fbdb400> c: c3 ret Code; f8ea9a7b <pg0+38a86a7b/3fbdb400> d: 90 nop Code; f8ea9a7c <pg0+38a86a7c/3fbdb400> e: 8d 74 26 00 lea 0x0(%esi),%esi Code; f8ea9a80 <pg0+38a86a80/3fbdb400> 12: 31 c0 xor %eax,%eax Code; f8ea9a82 <pg0+38a86a82/3fbdb400> 14: c3 ret 1 warning and 1 error issued. Results may not be reliable.
Have it got better in latest kernel versions?
Please reopen this bug if it's still present in kernel 2.6.19-rc6.