When trying to boot 4.1.0, I get this oops: ------------------------------------------------------------ Reading all physical valumes. This may take a while... Found volume group "rootvg" using metadata type lvm2 device-mapper: raid: Device 0 specified for rebuild: Clearing superblock md/raid1:mdX: active with 1 out of 2 mirrors mdX: invalid bitmap file superblock: bad magic md-cluster module not found. mdX: Could not setup cluster service (256) BUG: unable to handle kernel NULL pointer dereference at 0000000000000100 IP: [<ffffffff8159e4a9>] _raw_spin_lock_irq+0x29/0x70 PGD 0 Oops: 0002 [#1] PREEMPT SMP Modules linked in: dm_crypt microcode dm_raid raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx dm_mirror dm_region_hash dm_log dm_mod btrfs xor raid6_pq raid1 md_mod crct10dif_pclmul crc32_pclmul crc32_intel ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd e1000e ahci libahci ehci_pci ptp xhci_pci ehci_hcd pps_core xhci_hcd thermal CPU: 0 PID: 223 Comm: lvm Not tainted 4.1.0 #1 Hardware name: ASUS All Series/MAXIMUS VII RANGER, BIOS 2012 10/01/2014 task: ffff8805023fb1c0 ti: ffff880501dc4000 task.ti: ffff880501dc4000 RIP: 0010:[<ffffffff8159e4a9>] [<ffffffff8159e4a9>] _raw_spin_lock_irq+0x29/0x70 RSP: 0018:ffff880501dc7b18 EFLAGS: 00010082 RAX: 0000000000000100 RBX: 0000000000000100 RCX: 0000000000000000 RDX: ffff880501dc7ba0 RSI: 0000000000000000 RDI: 0000000000000100 RBP: ffff880501dc7b28 R08: 0000000800000000 R09: 00000008ffffffff R10: ffffffffa050fd67 R11: 0000000000000246 R12: 0000000000000000 R13: ffff880501dc7ba0 R14: 0000000000000000 R15: ffff880501dc7cf0 FS: 00007f267b99b800(0000) GS:ffff88051ec00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000100 CR3: 0000000501f6b000 CR4: 00000000001407f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Stack: ffff880501dc7b58 0000000000000100 ffff880501dc7b88 ffffffffa03b58c8 ffff8800dd4e52d0 00000000a0510080 ffff8800dd4e52e0 ffff880501fd0be8 ffff880501dc7b88 0000000000000000 ffff880506a05010 0000000000000100 Call Trace: [<ffffffffa03b58c8>] bitmap_start_sync+0x38/0xe0 [md_mod] in bitmap_start_sync (drivers/md/bitmap.c:1475) [<ffffffffa03b7459>] bitmap_load+0x49/0x130 [md_mod] in bitmap_load (drivers/md/bitmap.c:1822) [<ffffffffa0579e3d>] raid_resume+0x13d/0x230 [dm_raid] in raid_resume (drivers/md/dm-raid.c:1690) [<ffffffffa0512469>] dm_table_resume_targets+0x99/0xe0 [dm_mod] in dm_table_resume_targets (drivers/md/dm-table.c:1637) [<ffffffffa050fa21>] dm_resume+0xc1/0x100 [dm_mod] in dm_resume (drivers/md/dm.c:3290) [<ffffffffa0514dcb>] dev_suspend+0x12b/0x260 [dm_mod] in dev_suspend (drivers/md/dm-ioctl.c:1048) [<ffffffffa0514ca0>] ? table_load+0x340/0x340 [dm_mod] [<ffffffffa05156ae>] ctl_ioctl+0x22e/0x510 [dm_mod] in ctl_ioctl (drivers/md/dm-ioctl.c:1849) [<ffffffffa05159a3>] dm_ctl_ioctl+0x13/0x20 [dm_mod] in dm_ctl_ioctl (drivers/md/dm-ioctl.c:1867) [<ffffffff811d22a0>] do_vfs_ioctl+0x2e0/0x4e0 [<ffffffff81258c46>] ? security_sem_associate+0x16/0x20 [<ffffffff812450bb>] ? ipcget+0x14b/0x1d0 [<ffffffff811d2521>] SyS_ioctl+0x81/0xa0 [<ffffffff8159e9b2>] system_call_fastpath+0x16/0x75 Code: 00 00 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 fa 66 0f 1f 44 00 00 e8 a3 5c b8 ff 65 ff 05 bc c4 a6 7e b8 00 01 00 00 <f0> 66 0f c1 03 0f b6 cc 38 c1 75 0b 48 83 c4 08 5b 5d c3 0f 1f RIP [<ffffffff8159e4a9>] _raw_spin_lock_irq+0x29/0x70 RSP <ffff880501dc7b18> CR2: 0000000000000100 ---[ end trace 821307f06ab8f24a ]--- ------------------------------------------------------------ Under kernel 4.0.5, the system at least boots. It does resync the raid array at every boot, which is obviously less than optimal; stopping the md devices at shutdown always fails with Device or resource in use. I'm not sure if that's me or Debian doing something wrong. FWIW, here are the relevant messages from 4.0.5 boot: ------------------------------------------------------------ device-mapper: raid: Device 0 specified for rebuild: Clearing superblock md/raid1:mdX: active with 1 out of 2 mirrors mdX: invalid bitmap file superblock: bad magic created bitmap (675 pages) for device mdX mdX: bitmap initialized from disk: read 43 pages, set 44 of 1380520 bits RAID1 conf printout: --- wd:1 rd:2 disk 0, wo:1, o:1, dev:dm-3 disk 1, wo:0, o:1, dev:dm-5 RAID1 conf printout: --- wd:1 rd:2 disk 0, wo:1, o:1, dev:dm-3 disk 1, wo:0, o:1, dev:dm-5 md: recovery of RAID array mdX md: minimum _guaranteed_ speed: 1000 KB/sec/disk. md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery. md: using 128k window, over a total of 1413652480k. ------------------------------------------------------------ My setup is dm-crypt on top of LVM raid1 on physical disk.
Not being able to load md-cluster doesn't seem to be the only problem here. The system still does not boot after loading md_cluster.ko to the initramfs. Rather, I get output like this every 10 seconds: ------------------------------------------------------------ device-mapper: raid: Device 0 specified for rebuild: Clearing superblock md/raid1: mdX: active with 1 out of 2 mirrors mdX: invalid bitmap file superblock: bad magic dlm: no local IP address has been set dlm: cannot start dlm lowcomms -107 mdX: Could not setup cluster service (-107) mdX: failed to create bitmap (-107) device-mapper: table: 253:6: raid: Fail to run raid array device-mapper: ioctl: error adding target to table device-mapper: reload ioctl on failed: Transport endpoint is not connected ------------------------------------------------------------ Note: my setup is entirely local.
On Thu, Jun 25, 2015 at 09:02:45PM +0000, bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=100491 > > Bug ID: 100491 > Summary: Oops under bitmap_start_sync [md_mod] at boot [...] > Reading all physical valumes. This may take a while... > Found volume group "rootvg" using metadata type lvm2 > device-mapper: raid: Device 0 specified for rebuild: Clearing superblock > md/raid1:mdX: active with 1 out of 2 mirrors > mdX: invalid bitmap file superblock: bad magic > md-cluster module not found. > mdX: Could not setup cluster service (256) > BUG: unable to handle kernel NULL pointer dereference at 0000000000000100 > IP: [<ffffffff8159e4a9>] _raw_spin_lock_irq+0x29/0x70 > PGD 0 > Oops: 0002 [#1] PREEMPT SMP [...] I'm marking this as a regression in bugzilla, since this seems to prevent booting on 4.1.0 at least in certain circumstances (namely those which I have; I wonder if any raid1 recovery works?) while 4.0.6 boots correctly. I bisected this down to one of four commits. Well, assuming that the problem was caused by changes in drivers/md; a fair assumption, I think. The commits are: $ git bisect view --oneline f9209a3 bitmap_create returns bitmap pointer 96ae923 Gather on-going resync information of other nodes 54519c5 Lock bitmap while joining the cluster b97e9257 Use separate bitmaps for each nodes in the cluster The crash happens whether or not CONFIG_MD_CLUSTER is enabled. Here's the versions I tested: git bisect start '--' 'drivers/md' # bad: [b953c0d234bc72e8489d3bf51a276c5c4ec85345] Linux 4.1 # good: [39a8804455fb23f09157341d3ba7db6d7ae6ee76] Linux 4.0 # bad: [9ffc8f7cb9647b13dfe4d1ad0d5e1427bb8b46d6] md/raid5: don't do chunk aligned read on degraded array. # bad: [6dc69c9c460b0cf05b5b3f323a8b944a2e52e76d] md: recover_bitmaps() can be static # bad: [4b26a08af92c0d9c0bce07612b56ff326112321a] Perform resync for cluster node failure # good: [cf921cc19cf7c1e99f730a2faa02d80817d684a2] Add node recovery callbacks # skip: [96ae923ab659e37dd5fc1e05ecbf654e2f94bcbe] Gather on-going resync information of other nodes # bad: [f9209a323547f054c7439a3bf67c45e64a054bdd] bitmap_create returns bitmap pointer # skip: [54519c5f4b398bcfe599f652b4ef4004d5fa63ff] Lock bitmap while joining the cluster Sami
The reason you are facing the crash or performing a resync everytime you boot in earlier kernels is that the bitmap superblock is corrupted. This is the reason you are getting the message: mdX: invalid bitmap file superblock: bad magic The reason cluster code is being activated is: some padding from the superblock has been changed to store the number of nodes in the cluster. Now the corruption in your superblock is most likely present in this area as well, and has the number of nodes set to non-zero. Could you run mdadm --examine-bitmap /dev/sdXX using Neil's cluster branch (http://git.neil.brown.name/?p=mdadm.git;a=shortlog;h=refs/heads/cluster). Please provide the output here. Note: The command will produce more junk, but it will prove the theory. Finally, do you not allow the resync to complete? It should have written a clean superblock after resync is complete.
I usually indeed do allow the resync to complete. At next boot the superblock still seems to be corrupt always. Is it invalidated by failing to detach the DM device at shutdown? For some reason, I always get a bunch of "systemd-shutdown[1]: Could not detach DM /dev/dm-[23456]: Device or resource busy" at shutdown (The first relevant error seems to be "Failed to detach loop devices: Invalid argument". Here's a photo of the messages: https://goo.gl/photos/jbURnDrfo3w2SGpJA). If that doesn't cause it, it must be that the resync fails to write a clean one. So should I run the mdadm command on the LVM rmeta volumes? I have: # lvs --all LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert newroot rootvg rwi-aor--- 1.32t 100.00 [newroot_rimage_1] rootvg iwi-aor--- 1.32t [newroot_rimage_2] rootvg iwi-aor--- 1.32t [newroot_rmeta_1] rootvg ewi-aor--- 4.00m [newroot_rmeta_2] rootvg ewi-aor--- 4.00m (newroot is the volume that always resyncs at boot; as you can see, it's in sync here.) Running mdadm --examine-bitmap on either the rimage devices or the rmeta devices says the bitmap magic is invalid, so I assume LVM stores the bitmap somewhere else... I'm at the limits of my knowledge about LVM, but a comment in /etc/lvm/lvm.conf says thus (I assume the "log" is either the same as the bitmap or stored with it?): # "raid1" - This implementation leverages MD's RAID1 personality through # device-mapper. It is characterized by a lack of log options. # (A log is always allocated for every device and they are placed # on the same device as the image - no separate devices are # required.) This mirror implementation does not require I/O # to be blocked in the kernel in the event of a failure. # This mirror implementation is not cluster-aware and cannot be # used in a shared (active/active) fashion in a cluster.
Would you be able to provide me with the exact options used to create the device. I will try to replicate an environment as close as possible to yours. The trouble is from md so output of /proc/mdstat would be useful. However, it is the interaction of dm device with md which seems to be the problem.
-------------------- $ cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] unused devices: <none> -------------------- The device started as a normal LVM volume, which I lvconverted to a 2-way RAID1 logical volume. The filesystem is from February 2013, and I assume the LVM devices where generated with kernel/lvm from around that time. The commands must have been something like this, off the top of my head: [pvcreate on some volumes] # vgcreate rootvg physical_volumes... # lvcreate --size 1T --name newroot rootvg [operations on /dev/rootvg/newroot, like creating a luks encrypted volume and a filesystem on it] # lvconvert --type raid1 -m1 /dev/rootvg/newroot /path/to/physicalvolume1 /path/to/physicalvolume2 After that, one of the physical volumes has failed, and I recovered it somehow after removing the failing disk; I suspect by first converting it to a normal (linear) logical volume, then running the above lvconvert again. Sorry, the details are a bit fuzzy, especially since I found some of the lvconvert operations to replace a failed volume quite confusing (I feel lucky that I didn't drop the working mirror or something). I guess I could try to replicate it myself in a virtual machine. Still, I guess it might be easier to figure out what's wrong right now and maybe try to deduce from that how it became so. I guess LVM must store the md bitmap somewhere so I can dump it? I only (roughly) understand the interaction between LVM and device mapper. I guess there's no command or /proc file that would give details of the md mapping like 'dmsetup table' does for device mapper? Are these things documented somewhere besides the source code? :) I'm also pretty sure the volumes did stay in sync at some point. I could try to grep my logs to see if the perpetual resyncing started when I replaced the failed disk or at some later point, e.g. after a kernel upgrade.
Thanks for the inputs. I could not recreate the situation on my system. Yes, I think the "Could not detach" shutdown messages have something to do with the bitmaps MD not being correct/in-sync. From the looks of it, the bitmap superblock is corrupted. Could you boot with loglevel=7 parameter. This will give the MD bitmap super information as soon as it outputs the error. Boot with 4.0.5 to be safe because the error is happening there as well. I feel the bug exists before the md-cluster code was checked in. These patches force the failure in a worse way.
Created attachment 181491 [details] dmesg at loglevel=7 dmesg at loglevel=7 from kernel 4.0.7 attached. The array was reportedly in sync before rebooting. Still the first relevant thing I can see there is "device-mapper: raid: Device 0 specified for rebuild: Clearing superblock".
The original crash is, in part, because "request_module" in the kernel returns a positive error code instead of a negative error code as documented. Sad. We'll fix that, but having md-cluster module available will avoid that problem. dm-raid has separate regions for the data and the metadata. This is seen in: [newroot_rimage_1] rootvg iwi-aor--- 1.32t [newroot_rimage_2] rootvg iwi-aor--- 1.32t [newroot_rmeta_1] rootvg ewi-aor--- 4.00m [newroot_rmeta_2] rootvg ewi-aor--- 4.00m rimage_X is the data, rmeta_X is the metadata. This metadata contains a 4K block describing the array, and then the bitmap. (mdadm doesn't know how to read this). If you can 'dd' the first 8K of these 'rmeta' devices and attach that, I could have a look and see if there is anything to explain why the magic number is wrong.
Can I just clarify something: when you boot with a 4.0 kernel, does it all work as expected, or do you still get the "bad magic" errors?
With 4.0.x, I get "bad magic" and the array is reconstructed. With 4.1, the system fails to boot as described in Comment 1 (or the initial report if md-cluster is not available).
Created attachment 181521 [details] first 8k of rmeta_1 after sync completed
Created attachment 181531 [details] first 8k of rmeta_2 after sync completed First 8k of rmeta_[12] attached. These are at the point when the sync has been completed (on a running system, so I guess they might be slightly inconsistent if one is written after extracting the first but before extracting the second). I can probably also extract them at the point after shutdown / before boot later today, though that's slightly tricky since recovery begins automatically early at boot. I can probably figure out their physical location on disk and dump them from a non-lvm/raid aware OS.
Created attachment 181561 [details] Patch to start with zeroed page Could you please try this patch and report the output?
I just noticed another error in the code - obvious when you see it: diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c index 135a0907e9de..d2d294f3e4a0 100644 --- a/drivers/md/bitmap.c +++ b/drivers/md/bitmap.c @@ -671,7 +671,7 @@ out: kunmap_atomic(sb); /* Assiging chunksize is required for "re_read" */ bitmap->mddev->bitmap_info.chunksize = chunksize; - if (nodes && (bitmap->cluster_slot < 0)) { + if (err == 0 && nodes && (bitmap->cluster_slot < 0)) { err = md_setup_cluster(bitmap->mddev, nodes); if (err) { pr_err("%s: Could not setup cluster service (%d)\n", This still doesn't fix the fact that it resyncs all the time. That looks like a fairly deep problem with the integration of md/raid into dm. Thanks for the logs. They show that the "bitmap superblock" certainly has garbage for 'magic' and version, though the 'events' counter is correct. I'm quite sure I could recreated this for testing, so feel free to do anything you like to try to repair what you have. Possibly using lvconvert to remove the mirror, then lvconvert to put it back would work, but I have little experience with this. I'll try to come up with something more substantial in the next day or two.
Sorry - another question just to clarify: could you report the output of dmsetup table dmsetup info shortly after boot? Thanks.
Behavior at boot with 4.1.1 + patch from Comment 14 + patch from Comment 15 applied: Reading all physical volumes. This may take a while... Found volume group "rootvg" using metadata type lvm2 [2.27] device-mapper: raid: Device 0 specified for rebuild: Clearing superblock [2.27] md/raid1:mdX: active with 1 out of 2 mirrors [2.27] mdX: invalid bitmap file superblock: bad magic [2.27] mdX: failed to create bitmap (-22) [2.27] device-mapper: table: 253:6: raid: Fail to run raid array [2.27] device-mapper: ioctl: error adding target to table device-mapper: reload ioctl on failed: Invalid argument 2 logical volume(s) in volume group "rootvg" now active Reading all physical volumes. This may take a while... Found volume group "rootvg" using metadata type lvm2 [12.3] device-mapper: raid: Device 0 specified for rebuild: Clearing superblock [12.3] md/raid1:mdX: active with 1 out of 2 mirrors [12.3] mdX: invalid bitmap file superblock: bad magic [12.3] mdX: failed to create bitmap (-22) [... and so on, the same messages repeating every 10 seconds] Shortly after boot on 4.0.7: $ sudo dmsetup table rootvg-newroot_rimage_1: 0 1938857984 linear 8:22 10241 rootvg-newroot_rimage_1: 1938857984 888446976 linear 8:21 2049 rootvg-backup_crypt: 0 1677717504 crypt aes-cbc-essiv:sha256 0000000000000000000000000000000000000000000000000000000000000000 0 253:0 4096 rootvg-scratch: 0 1468006400 linear 8:34 2054 rootvg-newroot: 0 2827304960 raid raid1 5 0 rebuild 0 region_size 2048 2 253:2 253:3 253:4 253:5 rootvg-newroot_crypt: 0 2827300864 crypt aes-cbc-essiv:sha256 0000000000000000000000000000000000000000000000000000000000000000 0 253:6 4096 rootvg-newroot_rmeta_2: 0 8192 linear 8:2 2048 rootvg-newroot_rmeta_1: 0 8192 linear 8:22 2049 rootvg-backup: 0 974766080 linear 8:36 2052 rootvg-backup: 974766080 702955520 linear 8:35 147654663 rootvg-scratch_crypt: 0 1468002304 crypt aes-cbc-essiv:sha256 0000000000000000000000000000000000000000000000000000000000000000 0 253:1 4096 rootvg-newroot_rimage_2: 0 2827304960 linear 8:2 10240 $ sudo dmsetup info Name: rootvg-newroot_rimage_1 State: ACTIVE Read Ahead: 256 Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 5 Number of targets: 2 UUID: LVM-jVJFdlmF0JFIstUe5PGj05TSVBOdvnhagwXAuZl9Wz28tsfeYkp60chiiVJi3oLo Name: rootvg-backup_crypt State: ACTIVE Read Ahead: 256 Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 9 Number of targets: 1 UUID: CRYPT-LUKS1-21107b8520f546168113f2435fa5751b-rootvg-backup_crypt Name: rootvg-scratch State: ACTIVE Read Ahead: 256 Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 1 Number of targets: 1 UUID: LVM-jVJFdlmF0JFIstUe5PGj05TSVBOdvnhao5QuWUNb4cieO6Vebt2AudgMQ51oasSd Name: rootvg-newroot State: ACTIVE Read Ahead: 256 Tables present: LIVE Open count: 1 Event number: 1 Major, minor: 253, 6 Number of targets: 1 UUID: LVM-jVJFdlmF0JFIstUe5PGj05TSVBOdvnhaaoYTyDfZ7vRS4OqTd0sAm6lg3zDd5CR5 Name: rootvg-newroot_crypt State: ACTIVE Read Ahead: 256 Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 7 Number of targets: 1 UUID: CRYPT-LUKS1-9fd745bf8f8244ff808b674b52f78194-rootvg-newroot_crypt Name: rootvg-newroot_rmeta_2 State: ACTIVE Read Ahead: 256 Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 2 Number of targets: 1 UUID: LVM-jVJFdlmF0JFIstUe5PGj05TSVBOdvnhaUpv34B9iBL0mSiWIQHNt6uC6eFd6JG1v Differences to these after sync has been completed: # diff -u dmsetup-table-after-boot.txt <(dmsetup table) --- dmsetup-table-after-boot.txt 2015-07-02 09:32:26.392129572 +0300 +++ /dev/fd/63 2015-07-02 18:21:41.059161832 +0300 @@ -2,7 +2,7 @@ rootvg-newroot_rimage_1: 1938857984 888446976 linear 8:21 2049 rootvg-backup_crypt: 0 1677717504 crypt aes-cbc-essiv:sha256 0000000000000000000000000000000000000000000000000000000000000000 0 253:0 4096 rootvg-scratch: 0 1468006400 linear 8:34 2054 -rootvg-newroot: 0 2827304960 raid raid1 5 0 rebuild 0 region_size 2048 2 253:2 253:3 253:4 253:5 +rootvg-newroot: 0 2827304960 raid raid1 3 0 region_size 2048 2 253:2 253:3 253:4 253:5 rootvg-newroot_crypt: 0 2827300864 crypt aes-cbc-essiv:sha256 0000000000000000000000000000000000000000000000000000000000000000 0 253:6 4096 rootvg-newroot_rmeta_2: 0 8192 linear 8:2 2048 rootvg-newroot_rmeta_1: 0 8192 linear 8:22 2049 # diff -u dmsetup-info-after-boot.txt <(dmsetup info) --- dmsetup-info-after-boot.txt 2015-07-02 09:32:30.791104303 +0300 +++ /dev/fd/63 2015-07-02 18:21:58.539068283 +0300 @@ -33,7 +33,7 @@ Read Ahead: 256 Tables present: LIVE Open count: 1 -Event number: 1 +Event number: 2 Major, minor: 253, 6 Number of targets: 1 UUID: LVM-jVJFdlmF0JFIstUe5PGj05TSVBOdvnhaaoYTyDfZ7vRS4OqTd0sAm6lg3zDd5CR5
There, rootvg-newroot: 0 2827304960 raid raid1 5 0 rebuild 0 region_size 2048 2 253:2 253:3 253:4 253:5 I'm wondering why whatever is loading the table is purposefully rebuilding dev#0 position every time.
The sync issue does not seem to be the fault of the kernel, but whatever is instantiating the device in the first place.
Ah, so a boot script bug possibly?
not sure. I've never seen this type of issue before. I don't even think it is possible for the 'rebuild' param to be stored in the LVM metadata. (You could also check that by doing 'vgcfgbackup -f my_ma.txt <vg_name>'.) That means that something is evaluating the need to enforce a rebuild (or something else is going wildly wrong).
This issue has been observed downstream in Arch Linux. [1] I was able to reproduce it in a virtual machine using a RAID1 LVM logical volume for the root filesystem. Aside from the kernel oops with Linux 4.1.1, I didn't notice any resyncs (or other issues) with Linux 4.0.7 and 3.14.47. Since this issue is blocking Linux 4.1.x from moving out of our testing repository in Arch, it would be nice to know if you have been able to reproduce it and possibly have an idea of where the problem might be. I'm not sure if I can provide any other helpful information but I can test any stuff pointed out. [1] https://bugs.archlinux.org/task/45548
I am also able to reproduce this with 4.1.1 on a couple Ubuntu systems with RAID1 LVM for the root filesystem. My symptoms are more-or-less identical to the others, except without the bad superblocks, resyncs, etc. No issues with 4.0.(7,8).
Created attachment 182511 [details] [PATCH] Enable loading of bitmaps for dm-md-raid1 This patch adds Neil's comment #15 to the patch in comment #14. This should work for devices without major corruptions in the bitmap super block.
Evangelos, Ryan: Could you try the patch in comment #24 and report back? If you are not experiencing bad superblock errors, this patch should work.
(In reply to Goldwyn Rodrigues from comment #25) > Evangelos, Ryan: Could you try the patch in comment #24 and report back? If > you are not experiencing bad superblock errors, this patch should work. Works for me, thanks! (Applied on top of Linux 4.1.2.)
Patch works for me as well, also on top of 4.1.2. Excellent!
A similar problem affects my system also. From kernel 4.1.x to 4.2 (and linux-next as of yesterday) RAID5 does not work ("clearing superblock" and then oops because md_cluster not compiled into kernel). Last time I got a message about invalid hostname when I tried to boot a kernel with md_cluster compiled-in. Should I try the patch or post dmesg output, or is there nothing I could help with because the root cause was already found?
With 4.2.0-rc3, I can no longer reproduce the oops issue, and everything is happy.
Fwiw I'm seeing the same problem with both 4.1.6 and 4.2 (from the Debian 4.2-1~exp1 experimental package). I've even gone as far creating an entirely new mirrored volume for my root and dd'ing the filesystem from the old one to the new one. I waited until it completed syncing. It still reports an invalid bitmap file with a bad magic. I did have a problem with some volumes rebuilding on every boot on 4.0 but that wasn't the case for the root volume. I've now replaced both of those the same way as above and it has stopped doing those unnecessary rebuilds. But it's still not able to find the root on 4.1 or 4.2.
I also saw this: [Start from the state described in previous comments where every boot starts a resync.] 1. [in 4.0.6] lvconvert -m0 rootvg/newroot && lvconvert -m1 rootvg/newroot 2. Boot 4.2 -> does not boot (I have a photo of the messages, no oops) 3. Boot 4.0.6 -> boots; no longer resyncs, so *something* has changed 4. [in 4.0.6] lvconvert -m0 rootvg/newroot 5. Boot 4.2 6. [in 4.2] lvconvert -m1 rootvg/newroot So far I haven't tried to reboot 4.2 after step 6, but I'll do it if there's general interest in knowing what happens.
Ok, it still doesn't boot in 4.2 after those steps (at least after a crash) :(
No idea if this is related, but the same computer also fails to even start to sync a raid array just after creating it. I filed a separate Bug 105241.
So to save everyone else the hour and a half I spent figuring out how to work around this, here's what you need to do: # lvconvert --mirrors -1 <lv device> <pv_device> # vgchange -ay # lvconvert -m 1 <lv_device> Where lv_device is the device that's having issues, and <pv_device> is the SECONDARY device on the RAID1 mierror. This will throw a warning, but will remove the secondary device. vgchange should then be able to bring up the (now un-mirrored) volumes. After vgchange, run lvconvert again to add the mirror back. This requires a resync. The above has to be done after every reboot.
Created attachment 216931 [details] attachment-5941-0.html > The above has to be done after every reboot. I don't think something that has to be done after every reboot is really a workaround. How is this still a thing?
Has somebody managed to figure out in which case exactly this happens? I have a system where this at least used to happen even for new volumes in the volume group, but I have so far failed to reproduce this in a virtual machine for a new volume group.
Created attachment 216981 [details] attachment-8561-0.html The impression I get is that the bug causing the corruption was found and fixed but nobody actually added any workaround for users with existing corruption.
I may have been mistaken when I said this needs to be run on every reboot. When I posted my last update, it was after I had spent several hours working on this and I thought I had rebooted into 4.5. But I found out later I had rebooted back into 3.16 and that may be what caused the corruption again. Just updated to 4.5 so I will update again after next reboot if this is still an issue.
I've run into similar trouble with LVM RAID 5 and 6 volumes. Older kernels load the bitmap superblock despite the zero magic and write it back again as zero (the padding is also written back with whatever junk it had). I fixed this by booting with an old kernel (3.14), and re-rewriting the magic "bitm" at byte offset 4096 in the rmeta mapped devices. sync and poweroff the machine (or otherwise stop the bad superblock being re-rewritten to disk. The problem has occurred twice now on different volumes. My guess is that something goes wrong when the MD unload / superblock write-back happens, perhaps reset or poweroff happens too soon. The first volume was one I didn't regular start. The latter was one I created very recently, running a 4.something kernel.