Bug 12905
Summary: | heavy I/O and CRC errors causes dm-raid1 to freeze/lock up /dev/md1 | ||
---|---|---|---|
Product: | Memory Management | Reporter: | Mark (bugzilla-kernel-org-sux2k0) |
Component: | Page Allocator | Assignee: | Andrew Morton (akpm) |
Status: | RESOLVED OBSOLETE | ||
Severity: | blocking | CC: | alan, bugzilla-kernel-org-sux2k0, dustins.kernel.spam |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.25/7+Xen | Subsystem: | |
Regression: | No | Bisected commit-id: |
Description
Mark
2009-03-20 18:49:36 UTC
What I/O controllers are in use and are there any logged entries about I/O errors ? It is really a bunch of different hardware. The only common thing they share is the AMD Opteron architecture. lspci for different machines: # lspci 00:00.0 RAM memory: nVidia Corporation MCP55 Memory Controller (rev a2) 00:01.0 ISA bridge: nVidia Corporation MCP55 LPC Bridge (rev a3) 00:01.1 SMBus: nVidia Corporation MCP55 SMBus (rev a3) 00:02.0 USB Controller: nVidia Corporation MCP55 USB Controller (rev a1) 00:02.1 USB Controller: nVidia Corporation MCP55 USB Controller (rev a2) 00:04.0 IDE interface: nVidia Corporation MCP55 IDE (rev a1) 00:05.0 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:06.0 PCI bridge: nVidia Corporation MCP55 PCI bridge (rev a2) 00:08.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3) 00:09.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3) 00:0a.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0b.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0c.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0d.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0e.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0f.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:18.0 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] HyperTransport Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] Miscellaneous Control 00:18.4 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] Link Control 01:04.0 VGA compatible controller: XGI Technology Inc. (eXtreme Graphics Innovation) Volari Z7/Z9/Z9s # lspci 00:00.0 Host bridge: ATI Technologies Inc RS690 Host Bridge 00:01.0 PCI bridge: ATI Technologies Inc RS690 PCI to PCI Bridge (Internal gfx) 00:04.0 PCI bridge: ATI Technologies Inc Device 7914 00:05.0 PCI bridge: ATI Technologies Inc RS690 PCI to PCI Bridge (PCI Express Port 1) 00:12.0 SATA controller: ATI Technologies Inc SB600 Non-Raid-5 SATA 00:13.0 USB Controller: ATI Technologies Inc SB600 USB (OHCI0) 00:13.1 USB Controller: ATI Technologies Inc SB600 USB (OHCI1) 00:13.2 USB Controller: ATI Technologies Inc SB600 USB (OHCI2) 00:13.3 USB Controller: ATI Technologies Inc SB600 USB (OHCI3) 00:13.4 USB Controller: ATI Technologies Inc SB600 USB (OHCI4) 00:13.5 USB Controller: ATI Technologies Inc SB600 USB Controller (EHCI) 00:14.0 SMBus: ATI Technologies Inc SBx00 SMBus Controller (rev 14) 00:14.1 IDE interface: ATI Technologies Inc SB600 IDE 00:14.3 ISA bridge: ATI Technologies Inc SB600 PCI to LPC Bridge 00:14.4 PCI bridge: ATI Technologies Inc SBx00 PCI to PCI Bridge 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 01:05.0 VGA compatible controller: ATI Technologies Inc RS690M [Radeon X1200 Series] 02:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet Controller 03:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet Controller # lspci 00:01.0 PCI bridge: Broadcom BCM5785 [HT1000] PCI/PCI-X Bridge 00:02.0 Host bridge: Broadcom BCM5785 [HT1000] Legacy South Bridge 00:02.2 ISA bridge: Broadcom BCM5785 [HT1000] LPC 00:03.0 USB Controller: Broadcom BCM5785 [HT1000] USB (rev 01) 00:03.1 USB Controller: Broadcom BCM5785 [HT1000] USB (rev 01) 00:03.2 USB Controller: Broadcom BCM5785 [HT1000] USB (rev 01) 00:05.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02) 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 01:0d.0 PCI bridge: Broadcom BCM5785 [HT1000] PCI/PCI-X Bridge (rev b2) 01:0e.0 RAID bus controller: Broadcom BCM5785 [HT1000] SATA (Native SATA Mode) 02:03.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet (rev 10) 02:03.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet (rev 10) # lspci 00:00.0 RAM memory: nVidia Corporation C51 Host Bridge (rev a2) 00:00.1 RAM memory: nVidia Corporation C51 Memory Controller 0 (rev a2) 00:00.2 RAM memory: nVidia Corporation C51 Memory Controller 1 (rev a2) 00:00.3 RAM memory: nVidia Corporation C51 Memory Controller 5 (rev a2) 00:00.4 RAM memory: nVidia Corporation C51 Memory Controller 4 (rev a2) 00:00.5 RAM memory: nVidia Corporation C51 Host Bridge (rev a2) 00:00.6 RAM memory: nVidia Corporation C51 Memory Controller 3 (rev a2) 00:00.7 RAM memory: nVidia Corporation C51 Memory Controller 2 (rev a2) 00:02.0 PCI bridge: nVidia Corporation C51 PCI Express Bridge (rev a1) 00:03.0 PCI bridge: nVidia Corporation C51 PCI Express Bridge (rev a1) 00:05.0 VGA compatible controller: nVidia Corporation C51 [GeForce 6150 LE] (rev a2) 00:09.0 RAM memory: nVidia Corporation MCP51 Host Bridge (rev a2) 00:0a.0 ISA bridge: nVidia Corporation MCP51 LPC Bridge (rev a3) 00:0a.1 SMBus: nVidia Corporation MCP51 SMBus (rev a3) 00:0b.0 USB Controller: nVidia Corporation MCP51 USB Controller (rev a3) 00:0b.1 USB Controller: nVidia Corporation MCP51 USB Controller (rev a3) 00:0d.0 IDE interface: nVidia Corporation MCP51 IDE (rev f1) 00:0e.0 IDE interface: nVidia Corporation MCP51 Serial ATA Controller (rev f1) 00:0f.0 IDE interface: nVidia Corporation MCP51 Serial ATA Controller (rev f1) 00:10.0 PCI bridge: nVidia Corporation MCP51 PCI Bridge (rev a2) 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 04:07.0 Ethernet controller: Intel Corporation 82541PI Gigabit Ethernet Controller (rev 05) # lspci 00:06.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8111 PCI (rev 07) 00:07.0 ISA bridge: Advanced Micro Devices [AMD] AMD-8111 LPC (rev 05) 00:07.1 IDE interface: Advanced Micro Devices [AMD] AMD-8111 IDE (rev 03) 00:07.2 SMBus: Advanced Micro Devices [AMD] AMD-8111 SMBus 2.0 (rev 02) 00:07.3 Bridge: Advanced Micro Devices [AMD] AMD-8111 ACPI (rev 05) 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 01:00.0 USB Controller: Advanced Micro Devices [AMD] AMD-8111 USB (rev 0b) 01:00.1 USB Controller: Advanced Micro Devices [AMD] AMD-8111 USB (rev 0b) 01:0b.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27) 01:0d.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5705_2 Gigabit Ethernet (rev 03) 01:0e.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5705_2 Gigabit Ethernet (rev 03) No, there are no I/O errors. Just some expected buffer I/O problems because of invalidated (full) LVM2 snapshots. This appears to be a VM deadlock so for first guess assigning to the mm/vm folks Enabled some kernel debug options in 2.6.28.8: /var/log/messages: Mar 22 21:24:02 h1450662 lvm[1301]: Snapshot vg--xen--vm--backup--snap is now 95% full. Mar 22 21:28:44 h1450662 kernel: device-mapper: snapshots: Invalidating snapshot: Unable to allocate exception. Mar 22 21:28:44 h1450662 kernel: Mar 22 21:28:44 h1450662 kernel: ======================================================= Mar 22 21:28:44 h1450662 kernel: [ INFO: possible circular locking dependency detected ] Mar 22 21:28:44 h1450662 kernel: 2.6.28.8-3.2-default #3 Mar 22 21:28:44 h1450662 kernel: ------------------------------------------------------- Mar 22 21:28:44 h1450662 kernel: pdflush/7307 is trying to acquire lock: Mar 22 21:28:44 h1450662 kernel: (ksnaphd){--..}, at: [<ffffffff80256670>] flush_workqueue+0x0/0xb0 Mar 22 21:28:44 h1450662 kernel: Mar 22 21:28:44 h1450662 kernel: but task is already holding lock: Mar 22 21:28:44 h1450662 kernel: (&s->lock){--..}, at: [<ffffffff80466fa7>] __find_pending_exception+0xa7/0x1e0 Mar 22 21:28:44 h1450662 kernel: Mar 22 21:28:44 h1450662 kernel: which lock already depends on the new lock. Mar 22 21:28:44 h1450662 kernel: Mar 22 21:28:44 h1450662 kernel: Mar 22 21:28:44 h1450662 kernel: the existing dependency chain (in reverse order) is: Mar 22 21:28:47 h1450662 kernel: Mar 22 21:28:47 h1450662 kernel: -> #3 (&s->lock){--..}: Mar 22 21:28:47 h1450662 kernel: [<ffffffff8026c00a>] __lock_acquire+0x11ea/0x1bc0 Mar 22 21:28:47 h1450662 kernel: [<ffffffff8026ca71>] lock_acquire+0x91/0xc0 Mar 22 21:28:47 h1450662 kernel: [<ffffffff80504a63>] down_write+0x43/0x80 Mar 22 21:28:47 h1450662 kernel: [<ffffffff80467b49>] snapshot_map+0x79/0x320 Mar 22 21:28:47 h1450662 kernel: [<ffffffff8045f422>] __map_bio+0x42/0x160 Mar 22 21:28:47 h1450662 kernel: [<ffffffff8046062c>] __split_bio+0x4bc/0x590 Mar 22 21:28:47 h1450662 kernel: [<ffffffff804609ac>] dm_request+0x1fc/0x2a0 Mar 22 21:28:47 h1450662 kernel: [<ffffffff80348586>] generic_make_request+0x346/0x450 Mar 22 21:28:45 h1450662 lvm[1301]: Snapshot vg--xen--hwn-lvol--xen--vm--1466362--backup--snap changed state to: Invalid Mar 22 21:28:49 h1450662 kernel: [<ffffffff8034872b>] submit_bio+0x9b/0x150 Mar 22 21:28:49 h1450662 kernel: [<ffffffff802e978e>] submit_bh+0xee/0x130 Mar 22 21:28:49 h1450662 kernel: [<ffffffff802ed8c1>] block_read_full_page+0x2e1/0x3d0 Mar 22 21:28:49 h1450662 kernel: [<ffffffff802f1083>] blkdev_readpage+0x13/0x20 Mar 22 21:28:49 h1450662 kernel: [<ffffffff80299ce2>] __do_page_cache_readahead+0x212/0x290 Mar 22 21:28:49 h1450662 kernel: [<ffffffff8029a11b>] ondemand_readahead+0x1eb/0x270 Mar 22 21:28:49 h1450662 kernel: [<ffffffff8029a275>] page_cache_sync_readahead+0x25/0x30 Mar 22 21:28:49 h1450662 kernel: [<ffffffff8029278c>] generic_file_aio_read+0x43c/0x6b0 Mar 22 21:28:51 h1450662 lvm[1301]: No longer monitoring snapshot vg--xen--hwn-lvol--xen--vm--1466362--backup--snap Mar 22 21:28:49 h1450662 kernel: [<ffffffff802c58e1>] do_sync_read+0xf1/0x140 Mar 22 21:28:53 h1450662 kernel: [<ffffffff802c6448>] vfs_read+0xc8/0x170 Mar 22 21:28:53 h1450662 kernel: [<ffffffff802c65e0>] sys_read+0x50/0x90 Mar 22 21:28:53 h1450662 kernel: [<ffffffff8020c3cb>] system_call_fastpath+0x16/0x1b Mar 22 21:28:53 h1450662 kernel: [<ffffffffffffffff>] 0xffffffffffffffff Mar 22 21:28:53 h1450662 kernel: Mar 22 21:28:53 h1450662 kernel: -> #2 (&md->io_lock){----}: Mar 22 21:28:53 h1450662 kernel: [<ffffffff8026c00a>] __lock_acquire+0x11ea/0x1bc0 Mar 22 21:28:53 h1450662 kernel: [<ffffffff8026ca71>] lock_acquire+0x91/0xc0 Mar 22 21:28:53 h1450662 kernel: [<ffffffff80504ae6>] down_read+0x46/0x80 Mar 22 21:28:56 h1450662 kernel: [<ffffffff804607dc>] dm_request+0x2c/0x2a0 Mar 22 21:28:56 h1450662 kernel: [<ffffffff80348586>] generic_make_request+0x346/0x450 Mar 22 21:28:56 h1450662 kernel: [<ffffffff8034872b>] submit_bio+0x9b/0x150 Mar 22 21:28:56 h1450662 kernel: [<ffffffff80465bff>] dispatch_io+0x1bf/0x1f0 Mar 22 21:28:56 h1450662 kernel: [<ffffffff80465e35>] dm_io+0x185/0x2c0 Mar 22 21:28:56 h1450662 kernel: [<ffffffff80468b54>] do_metadata+0x24/0x30 Mar 22 21:28:56 h1450662 kernel: [<ffffffff80255b7c>] run_workqueue+0xfc/0x240 Mar 22 21:28:56 h1450662 kernel: [<ffffffff80256080>] worker_thread+0x60/0xb0 Mar 22 21:28:56 h1450662 kernel: [<ffffffff8025a1e9>] kthread+0x49/0x80 Mar 22 21:28:53 h1450662 lvm[1301]: Unlocking memory Mar 22 21:28:56 h1450662 kernel: [<ffffffff8020d519>] child_rip+0xa/0x11 Mar 22 21:29:00 h1450662 kernel: [<ffffffffffffffff>] 0xffffffffffffffff Mar 22 21:29:00 h1450662 kernel: Mar 22 21:29:00 h1450662 kernel: -> #1 (&req.work){--..}: Mar 22 21:29:00 h1450662 kernel: [<ffffffff8026c00a>] __lock_acquire+0x11ea/0x1bc0 Mar 22 21:29:00 h1450662 kernel: [<ffffffff8026ca71>] lock_acquire+0x91/0xc0 Mar 22 21:29:00 h1450662 kernel: [<ffffffff80255b76>] run_workqueue+0xf6/0x240 Mar 22 21:29:00 h1450662 kernel: [<ffffffff80256080>] worker_thread+0x60/0xb0 Mar 22 21:29:00 h1450662 kernel: [<ffffffff8025a1e9>] kthread+0x49/0x80 Mar 22 21:29:00 h1450662 kernel: [<ffffffff8020d519>] child_rip+0xa/0x11 Mar 22 21:29:05 h1450662 kernel: [<ffffffffffffffff>] 0xffffffffffffffff Mar 22 21:29:05 h1450662 kernel: Mar 22 21:29:05 h1450662 kernel: -> #0 (ksnaphd){--..}: Mar 22 21:29:05 h1450662 kernel: [<ffffffff8026c188>] __lock_acquire+0x1368/0x1bc0 Mar 22 21:29:05 h1450662 kernel: [<ffffffff8026ca71>] lock_acquire+0x91/0xc0 Mar 22 21:29:05 h1450662 kernel: [<ffffffff802566cd>] flush_workqueue+0x5d/0xb0 Mar 22 21:29:05 h1450662 kernel: [<ffffffff80468cc3>] chunk_io+0xe3/0x120 Mar 22 21:29:05 h1450662 kernel: [<ffffffff80468d5c>] write_header+0x5c/0x70 Mar 22 21:29:05 h1450662 kernel: [<ffffffff80468d84>] persistent_drop+0x14/0x30 Mar 22 21:29:05 h1450662 kernel: [<ffffffff80466eb5>] __invalidate_snapshot+0x35/0x80 Mar 22 21:29:08 h1450662 kernel: [<ffffffff804674b5>] origin_map+0x3d5/0x400 Mar 22 21:29:08 h1450662 kernel: [<ffffffff8045f422>] __map_bio+0x42/0x160 Mar 22 21:29:08 h1450662 kernel: [<ffffffff8046058e>] __split_bio+0x41e/0x590 Mar 22 21:29:08 h1450662 kernel: [<ffffffff804609ac>] dm_request+0x1fc/0x2a0 Mar 22 21:29:08 h1450662 kernel: [<ffffffff80348586>] generic_make_request+0x346/0x450 Mar 22 21:29:08 h1450662 kernel: [<ffffffff8034872b>] submit_bio+0x9b/0x150 Mar 22 21:29:08 h1450662 kernel: [<ffffffff802e978e>] submit_bh+0xee/0x130 Mar 22 21:29:08 h1450662 kernel: [<ffffffff802ecd01>] __block_write_full_page+0x1e1/0x3b0 Mar 22 21:29:08 h1450662 kernel: [<ffffffff802ecff9>] block_write_full_page+0x129/0x140 Mar 22 21:29:08 h1450662 kernel: [<ffffffffa0034086>] ext3_ordered_writepage+0xc6/0x190 [ext3] Mar 22 21:29:10 h1450662 kernel: [<ffffffff80297be5>] __writepage+0x15/0x50 Mar 22 21:29:10 h1450662 kernel: [<ffffffff80298827>] write_cache_pages+0x237/0x480 Mar 22 21:29:10 h1450662 kernel: [<ffffffff80298a92>] generic_writepages+0x22/0x30 Mar 22 21:29:10 h1450662 kernel: [<ffffffff80298add>] do_writepages+0x3d/0x50 Mar 22 21:29:10 h1450662 kernel: [<ffffffff802e488b>] __writeback_single_inode+0x9b/0x410 Mar 22 21:29:10 h1450662 kernel: [<ffffffff802e51e4>] generic_sync_sb_inodes+0x364/0x440 Mar 22 21:29:10 h1450662 kernel: [<ffffffff802e54a5>] writeback_inodes+0x65/0x100 Mar 22 21:29:10 h1450662 kernel: [<ffffffff802991a0>] background_writeout+0xb0/0xe0 Mar 22 21:29:10 h1450662 kernel: [<ffffffff802998d0>] pdflush+0x110/0x1f0 Mar 22 21:29:10 h1450662 kernel: [<ffffffff8025a1e9>] kthread+0x49/0x80 Mar 22 21:29:11 h1450662 kernel: [<ffffffff8020d519>] child_rip+0xa/0x11 Mar 22 21:29:11 h1450662 kernel: [<ffffffffffffffff>] 0xffffffffffffffff Mar 22 21:29:11 h1450662 kernel: Mar 22 21:29:11 h1450662 kernel: other info that might help us debug this: Mar 22 21:29:11 h1450662 kernel: Mar 22 21:29:11 h1450662 kernel: 5 locks held by pdflush/7307: Mar 22 21:29:11 h1450662 kernel: #0: (&type->s_umount_key#13){----}, at: [<ffffffff802e548e>] writeback_inodes+0x4e/0x100 Mar 22 21:29:11 h1450662 kernel: #1: (jbd_handle){--..}, at: [<ffffffffa0019e9e>] journal_start+0xce/0x140 [jbd] Mar 22 21:29:11 h1450662 kernel: #2: (&md->io_lock){----}, at: [<ffffffff804607dc>] dm_request+0x2c/0x2a0 Mar 22 21:29:11 h1450662 kernel: #3: (&_origins_lock){----}, at: [<ffffffff8046713c>] origin_map+0x5c/0x400 Mar 22 21:29:14 h1450662 kernel: #4: (&s->lock){--..}, at: [<ffffffff80466fa7>] __find_pending_exception+0xa7/0x1e0 Mar 22 21:29:14 h1450662 kernel: Mar 22 21:29:14 h1450662 kernel: stack backtrace: Mar 22 21:29:14 h1450662 kernel: Pid: 7307, comm: pdflush Not tainted 2.6.28.8-3.2-default #3 Mar 22 21:29:14 h1450662 kernel: Call Trace: Mar 22 21:29:14 h1450662 kernel: [<ffffffff8026aa08>] print_circular_bug_tail+0xd8/0xe0 Mar 22 21:29:14 h1450662 kernel: [<ffffffff8026c188>] __lock_acquire+0x1368/0x1bc0 Mar 22 21:29:14 h1450662 kernel: [<ffffffff8023acb8>] ? finish_task_switch+0x68/0xf0 Mar 22 21:29:14 h1450662 kernel: [<ffffffff8023ac50>] ? finish_task_switch+0x0/0xf0 Mar 22 21:29:14 h1450662 kernel: [<ffffffff80502da0>] ? thread_return+0xa3/0x723 Mar 22 21:29:00 h1450662 lvm[1301]: memlock_count dec to 0 Mar 22 21:29:16 h1450662 kernel: [<ffffffff8020ca08>] ? restore_args+0x0/0x30 Mar 22 21:29:16 h1450662 lvm[1301]: Dumping persistent device cache to /etc/lvm/.cache Mar 22 21:29:16 h1450662 kernel: [<ffffffff8026ca71>] lock_acquire+0x91/0xc0 Mar 22 21:29:16 h1450662 kernel: [<ffffffff80256670>] ? flush_workqueue+0x0/0xb0 Mar 22 21:29:16 h1450662 lvm[1301]: Locking /etc/lvm/.cache (F_WRLCK, 1) Mar 22 21:29:16 h1450662 kernel: [<ffffffff802566cd>] flush_workqueue+0x5d/0xb0 Mar 22 21:29:16 h1450662 kernel: [<ffffffff80256670>] ? flush_workqueue+0x0/0xb0 Mar 22 21:29:16 h1450662 kernel: [<ffffffff80468cc3>] chunk_io+0xe3/0x120 Mar 22 21:29:16 h1450662 kernel: [<ffffffff80468b30>] ? do_metadata+0x0/0x30 Mar 22 21:29:18 h1450662 kernel: [<ffffffff8026a3a9>] ? trace_hardirqs_on_caller+0x129/0x190 Mar 22 21:29:18 h1450662 kernel: [<ffffffff80468d5c>] write_header+0x5c/0x70 Mar 22 21:29:18 h1450662 kernel: [<ffffffff80468d84>] persistent_drop+0x14/0x30 Mar 22 21:29:18 h1450662 kernel: [<ffffffff80466eb5>] __invalidate_snapshot+0x35/0x80 Mar 22 21:29:18 h1450662 kernel: [<ffffffff804674b5>] origin_map+0x3d5/0x400 Mar 22 21:29:18 h1450662 kernel: [<ffffffff802930c6>] ? mempool_alloc+0x56/0x140 Mar 22 21:29:18 h1450662 kernel: [<ffffffff8045f422>] __map_bio+0x42/0x160 Mar 22 21:29:18 h1450662 kernel: [<ffffffff802eed7d>] ? bio_alloc_bioset+0x5d/0x100 Mar 22 21:29:18 h1450662 kernel: [<ffffffff8046058e>] __split_bio+0x41e/0x590 Mar 22 21:29:18 h1450662 kernel: [<ffffffff804601ab>] ? __split_bio+0x3b/0x590 Mar 22 21:29:21 h1450662 kernel: [<ffffffff80267c24>] ? get_lock_stats+0x34/0x80 Mar 22 21:29:21 h1450662 kernel: [<ffffffff804609ac>] dm_request+0x1fc/0x2a0 Mar 22 21:29:21 h1450662 kernel: [<ffffffff804607d4>] ? dm_request+0x24/0x2a0 Mar 22 21:29:21 h1450662 kernel: [<ffffffff80348586>] generic_make_request+0x346/0x450 Mar 22 21:29:21 h1450662 kernel: [<ffffffff802930c6>] ? mempool_alloc+0x56/0x140 Mar 22 21:29:21 h1450662 kernel: [<ffffffff802930c6>] ? mempool_alloc+0x56/0x140 Mar 22 21:29:21 h1450662 kernel: [<ffffffff8034872b>] submit_bio+0x9b/0x150 Mar 22 21:29:21 h1450662 kernel: [<ffffffff802eed7d>] ? bio_alloc_bioset+0x5d/0x100 Mar 22 21:29:21 h1450662 kernel: [<ffffffff802e978e>] submit_bh+0xee/0x130 Mar 22 21:29:21 h1450662 kernel: [<ffffffff802ecd01>] __block_write_full_page+0x1e1/0x3b0 Mar 22 21:29:22 h1450662 kernel: [<ffffffffa0035790>] ? ext3_get_block+0x0/0x130 [ext3] Mar 22 21:29:22 h1450662 kernel: [<ffffffffa0035790>] ? ext3_get_block+0x0/0x130 [ext3] Mar 22 21:29:22 h1450662 kernel: [<ffffffff802ecff9>] block_write_full_page+0x129/0x140 Mar 22 21:29:22 h1450662 kernel: [<ffffffffa0034086>] ext3_ordered_writepage+0xc6/0x190 [ext3] Mar 22 21:29:22 h1450662 kernel: [<ffffffff80297be5>] __writepage+0x15/0x50 Mar 22 21:29:22 h1450662 kernel: [<ffffffff80298827>] write_cache_pages+0x237/0x480 Mar 22 21:29:22 h1450662 kernel: [<ffffffff80297bd0>] ? __writepage+0x0/0x50 Mar 22 21:29:22 h1450662 kernel: [<ffffffff802e4880>] ? __writeback_single_inode+0x90/0x410 Mar 22 21:29:22 h1450662 kernel: [<ffffffff80298a92>] generic_writepages+0x22/0x30 Mar 22 21:29:22 h1450662 kernel: [<ffffffff80298add>] do_writepages+0x3d/0x50 Mar 22 21:29:22 h1450662 kernel: [<ffffffff802e488b>] __writeback_single_inode+0x9b/0x410 Mar 22 21:29:22 h1450662 kernel: [<ffffffff80506100>] ? _read_unlock+0x30/0x60 Mar 22 21:29:22 h1450662 kernel: [<ffffffff802e51e4>] generic_sync_sb_inodes+0x364/0x440 Mar 22 21:29:22 h1450662 kernel: [<ffffffff802e54a5>] writeback_inodes+0x65/0x100 Mar 22 21:29:22 h1450662 kernel: [<ffffffff802991a0>] background_writeout+0xb0/0xe0 Mar 22 21:29:22 h1450662 kernel: [<ffffffff802998d0>] pdflush+0x110/0x1f0 Mar 22 21:29:22 h1450662 kernel: [<ffffffff802990f0>] ? background_writeout+0x0/0xe0 Mar 22 21:29:22 h1450662 kernel: [<ffffffff802997c0>] ? pdflush+0x0/0x1f0 Mar 22 21:29:22 h1450662 kernel: [<ffffffff8025a1e9>] kthread+0x49/0x80 Mar 22 21:29:22 h1450662 kernel: [<ffffffff8020d519>] child_rip+0xa/0x11 Mar 22 21:29:22 h1450662 kernel: [<ffffffff8020ca08>] ? restore_args+0x0/0x30 Mar 22 21:29:22 h1450662 kernel: [<ffffffff8025a1a0>] ? kthread+0x0/0x80 Mar 22 21:29:22 h1450662 kernel: [<ffffffff8020d50f>] ? child_rip+0x0/0x11 Mar 22 21:29:22 h1450662 kernel: __ratelimit: 7 callbacks suppressed Mar 22 21:29:22 h1450662 kernel: Buffer I/O error on device dm-3, logical block 1683024 Mar 22 21:29:22 h1450662 kernel: Buffer I/O error on device dm-3, logical block 1683025 Mar 22 21:29:22 h1450662 kernel: Buffer I/O error on device dm-3, logical block 1683026 Mar 22 21:29:22 h1450662 kernel: Buffer I/O error on device dm-3, logical block 1683027 Mar 22 21:29:22 h1450662 kernel: Buffer I/O error on device dm-3, logical block 1683028 Mar 22 21:29:22 h1450662 kernel: Buffer I/O error on device dm-3, logical block 1683029 Mar 22 21:29:23 h1450662 kernel: Buffer I/O error on device dm-3, logical block 1683030 Mar 22 21:29:23 h1450662 kernel: Buffer I/O error on device dm-3, logical block 1683031 Mar 22 21:29:23 h1450662 kernel: Buffer I/O error on device dm-3, logical block 1683032 Mar 22 21:29:23 h1450662 kernel: Buffer I/O error on device dm-3, logical block 1683033 Mar 22 21:29:16 h1450662 lvm[1301]: Opened /etc/lvm/.cache RO Mar 22 21:29:23 h1450662 lvm[1301]: Loaded persistent filter cache from /etc/lvm/.cache Mar 22 21:29:23 h1450662 lvm[1301]: Unlocking fd 8 Mar 22 21:29:23 h1450662 lvm[1301]: Closed /etc/lvm/.cache Mar 22 21:29:23 h1450662 lvm[1301]: Wiping internal VG cache Looks like the snapshot invalidation is not handled properly. hmmm no, also happens with nearly empty snapshot devices... Also happens with "echo check > .../mdX/md/sync_action" Update: I have two machines which seem to have totally identical hardware. One triggers this bug, the other not... The one triggering the bug show UDMA CRC errors for one disk (via smartctl -a). Are there any mdraid issues related to hardware defects? Improperly handled, specific I/O errors? Can't see anything in logs, though. Closing as obsolete, if this is still seen with modern kernels please re-open and update |