Most recent kernel where this bug did not occur: 2.6.14-rc1 Distribution: Linux/GNU Debian 3.0 Sarge Hardware Environment: 2x Pentium Xeon 2.8, 512MB DDR, 3ware 9xxx SATA raid controller, 6x 250MB SATA HDDs, Raid0 = 1.36T Software Environment: libdevmapper 1.01.04 stable or 1.01.05-cvs, lvm 2.01.09 stable or 2.01.15-cvs, samba 3.0.14a Problem Description: From dmesg: device-mapper: 4.4.0-ioctl (2005-01-12) initialised: dm-devel@redhat.com device-mapper: dm-multipath version 1.0.4 loaded device-mapper: dm-round-robin version 1.0.0 loaded scsi0 : 3ware 9000 Storage Controller 3w-9xxx: scsi0: Found a 3ware 9000 Storage Controller at 0xfe8ffc00, IRQ: 10. SCSI device sda: 2929557504 512-byte hdwr sectors (1499933 MB) EXT3 FS on dm-0, internal journal EXT3-fs: mounted filesystem with ordered data mode. XFS mounting filesystem dm-1 Ending clean XFS mount for filesystem: dm-1 XFS quotacheck dm-1: Please wait. XFS quotacheck dm-1: Done. XFS mounting filesystem dm-4 XFS resetting qflags for filesystem dm-4 Ending clean XFS mount for filesystem: dm-4 XFS mounting filesystem dm-6 XFS resetting qflags for filesystem dm-6 Ending clean XFS mount for filesystem: dm-6 ------------[ cut here ]------------ kernel BUG at drivers/md/kcopyd.c:145! invalid operand: 0000 [#1] Modules linked in: eepro100 e1000 3w_9xxx uhci_hcd ohci_hcd usbhid sd_mod ftdi_sio usbserial ehci_hcd usbcore CPU: 0 EIP: 0060:[<c0368f2b>] Not tainted VLI EFLAGS: 00010283 (2.6.14-rc1) EIP is at client_free_pages+0x3b/0x50 eax: 00000100 ebx: ccdf2e80 ecx: cb8ab8a0 edx: 00000000 esi: e08ad080 edi: 00000000 ebp: 00000000 esp: d2763ec4 ds: 007b es: 007b ss: 0068 Process lvremove (pid: 14694, threadinfo=d2762000 task=dc4485a0) Stack: ccdf2e80 ccdf2e80 c036997e ccdf2e80 c7a0fd80 c036cdb4 ccdf2e80 dd696a40 e08ad080 deddb580 c0363dda e08ad080 deddb580 cb8ab6a0 d2762000 080e0c28 c03660d9 deddb580 00000000 c048c7e0 e088b000 c0366940 cb8ab6a0 00000000 Call Trace: [<c036997e>] kcopyd_client_destroy+0x1e/0x3a [<c036cdb4>] snapshot_dtr+0x74/0x90 [<c0363dda>] table_destroy+0x8a/0xa0 [<c03660d9>] __hash_remove+0x79/0xa0 [<c0366940>] dev_remove+0x50/0xe0 [<c0368019>] ctl_ioctl+0xf9/0x160 [<c03668f0>] dev_remove+0x0/0xe0 [<c0169778>] do_ioctl+0x58/0x80 [<c0169905>] vfs_ioctl+0x65/0x1e0 [<c038cc44>] net_rx_action+0x74/0x100 [<c0169ae7>] sys_ioctl+0x67/0x90 [<c0102bb9>] syscall_call+0x7/0xb Code: 10 75 28 8b 43 08 89 04 24 e8 52 ff ff ff c7 43 08 00 00 00 00 c7 43 0c 00 00 00 00 c7 43 10 00 00 00 00 8b 5c 24 04 83 c4 08 c3 <0f> 0b 91 00 b4 93 43 c0 eb ce 8d 74 26 00 8d bc 27 00 00 00 00 <1>Unable to handle kernel NULL pointer dereference at virtual address 00000034 printing eip: c015c13d *pde = 15124001 Oops: 0000 [#2] Modules linked in: eepro100 e1000 3w_9xxx uhci_hcd ohci_hcd usbhid sd_mod ftdi_sio usbserial ehci_hcd usbcore CPU: 0 EIP: 0060:[<c015c13d>] Not tainted VLI EFLAGS: 00010292 (2.6.14) EIP is at bio_add_page+0xd/0x40 eax: 00000000 ebx: 00000010 ecx: 00000000 edx: d5d56500 esi: 00000000 edi: d5d56500 ebp: d4f05ef0 esp: d4f05e2c ds: 007b es: 007b ss: 0068 Process kcopyd (pid: 14496, threadinfo=d4f04000 task=dff7e030) Stack: 00000001 00000010 00000000 00000000 d4f05ef0 c036881c d5d56500 c1280fc0 00001000 00000000 c1280fc0 00001000 00000000 d4f05ef0 00000000 00000000 00000001 c0368939 00000001 00000000 d9b118cc d4f05ef0 c1b16c40 c03685b0 Call Trace: [<c036881c>] do_region+0xec/0x140 [<c0368939>] dispatch_io+0xc9/0xd0 [<c03685b0>] list_get_page+0x0/0x30 [<c03685e0>] list_next_page+0x0/0x20 [<c03690c0>] complete_io+0x0/0x90 [<c0368ac3>] async_io+0x83/0xe0 [<c0368c73>] dm_io_async+0x53/0x60 [<c03690c0>] complete_io+0x0/0x90 [<c03685b0>] list_get_page+0x0/0x30 [<c03685e0>] list_next_page+0x0/0x20 [<c0369198>] run_io_job+0x48/0x90 [<c03690c0>] complete_io+0x0/0x90 [<c03692b2>] process_jobs+0x42/0xa0 [<c0369310>] do_work+0x0/0x50 [<c0369352>] do_work+0x42/0x50 [<c0369150>] run_io_job+0x0/0x90 [<c0125e3d>] worker_thread+0x1ad/0x250 [<c0113a70>] default_wake_function+0x0/0x20 [<c0113a70>] default_wake_function+0x0/0x20 [<c0125c90>] worker_thread+0x0/0x250 [<c0129d2a>] kthread+0xaa/0xb0 [<c0129c80>] kthread+0x0/0xb0 [<c0100fd9>] kernel_thread_helper+0x5/0xc Code: eb ac 0f b7 86 42 01 00 00 66 39 43 1e 0f 83 6e fe ff ff e9 9b fe ff ff 8d b6 00 00 00 00 83 ec 14 8b 54 24 18 8b 42 0c 8b 40 50 <8b> 48 34 8b 44 24 24 89 54 24 04 89 0c 24 89 44 24 10 8b 44 24 Steps to reproduce: pvcreate /dev/sda vgcreate vg /dev/sda lvcreate -L 698G -n lv vg /dev/sda mkfs.xfs /dev/vg0 mount -t xfs -o usrquota,grpquota,noatime,nodiratime /dev/vg/lv /mnt/lv do some smb share /mnt/smb start copying data from remote host to smb share and under this process lvcreate / lvremove 2 big snapshots (347G both) with xfs_freeze -f/u interaction: xfs_freeze -f /mnt/lv lvcreate -s -l 11168 -n S1 -p rw /dev/vg/lv & sleep 7 xfs_freeze -u /mnt/lv mount -t xfs -o noatime,nodiratime,nouuid,ro /dev/vg/S1 /snapshots/S1 sleep 30 create second snapshot (named S2) identical like S1 above and mount it too sleep 3m xfs_freeze -f /mnt/lv lvremove /dev/vg/lv/S1 & sleep 1 xfs_freeze -u /mnt/lv sleep 30 remove second snapshot identically like S1 make 5 iterations of lvcreate/lvremove 2 snapshots like is shown above Thanks for any help Best Regards, Hubert
Do you still see problems if you remove all the xfs_freeze commands? They shouldn't be necessary (dm handles the freezing itself) and they have caused trouble in the past.
Without xfs_freeze -f/u (or dmsetup suspend/resume) any lvcreate or lvremove snapshot of XFS lv and copying data to smb share hang up and lvcreate or lvremove process stays in memory as death. I noticed that with xfs_freeze -f/u lvcreate or lvremove process going better than with dmsetup suspend/resume interaction. I know that dm handles the freezing itself but still it is not enough or sorry but it does it in wrong way What should be the proper dmsetup tool usage with lvm2 create/remove snapshot ? Regards Hubert
Is this bug still present in current kernels?
Please reopen this bug if it's still present with kernel 2.6.20.