Created attachment 166101 [details] photo of screen, kernel panic # fallocate -l 250M b1 # fallocate -l 250M b2 # fallocate -l 250M b3 # fallocate -l 250M b4 # fallocate -l 250M b5 # mkfs.btrfs -M -draid1 -mraid1 b1 b2 b3 b4 b5 # losetup -f b[12345] # mount /dev/loop0 /mnt/btr # cp <somefiles> /mnt/btr # btrfs fi show Label: none uuid: be5b794d-a603-4eee-adfa-74711c010cfa Total devices 5 FS bytes used 366.42MiB devid 1 size 250.00MiB used 250.00MiB path /dev/loop0 devid 2 size 250.00MiB used 177.00MiB path /dev/loop1 devid 3 size 250.00MiB used 192.00MiB path /dev/loop2 devid 4 size 250.00MiB used 246.00MiB path /dev/loop3 devid 5 size 250.00MiB used 249.00MiB path /dev/loop4 Btrfs v3.18.2 [root@f22s btr]# btrfs scrub start -Bd /mnt/btr scrub device /dev/loop0 (id 1) done scrub started at Sun Feb 8 17:59:30 2015 and finished after 3 seconds total bytes scrubbed: 137.52MiB with 0 errors scrub device /dev/loop1 (id 2) done scrub started at Sun Feb 8 17:59:30 2015 and finished after 3 seconds total bytes scrubbed: 176.32MiB with 0 errors scrub device /dev/loop2 (id 3) done scrub started at Sun Feb 8 17:59:30 2015 and finished after 3 seconds total bytes scrubbed: 191.33MiB with 0 errors scrub device /dev/loop3 (id 4) done scrub started at Sun Feb 8 17:59:30 2015 and finished after 3 seconds total bytes scrubbed: 137.70MiB with 0 errors scrub device /dev/loop4 (id 5) done scrub started at Sun Feb 8 17:59:30 2015 and finished after 4 seconds total bytes scrubbed: 240.29MiB with 0 errors # umount /mnt/btr # losetup -d /dev/loop3 # rm -f b4 # mount /dev/loop0 /mnt/btr mount: wrong fs type, bad option, bad superblock on /dev/loop0, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so. # mount /dev/loop0 /mnt/btr -o degraded # btrfs device delete /dev/loop4 /mnt/btr ERROR: error removing the device '/dev/loop4' - No space left on device # rm -rf <somefiles> # btrfs device delete /dev/loop4 /mnt/btr No errors for about 1 minute, and then the attached photo of kernel panic. The sshd shell was frozen, no console access.
Created attachment 166111 [details] journal with crash stack trace Looks like the disk deleted OK, but then 32 seconds later there's a BUG warning and the oops. Attached is the full journalctl -b-1 -l -o short-monotonic output. [29624.382790] f22s.localdomain kernel: BTRFS info (device loop4): disk deleted /dev/loop4 [29656.902166] f22s.localdomain kernel: BUG: unable to handle kernel NULL pointer dereference at 0000005c Last time in the journal is [29656.903019], while the first time in the photo is 29663, so about 7 seconds missing (lost) between this journal and the photo.
Is it even a good idea for Btrfs to permit device removal of a degraded volume? Seems like this shouldn't be permitted, or at best permitted only with a --force flag. Best practices is probably to add a device and remount back to normal mode, and then do the device delete. I don't know that anything good comes from removing a device from a degraded volume.
Created attachment 166261 [details] dmesg 3.19.0 Is reproducible in simplified form, by not first overfilling the volume (thus avoiding enospc). # mount -o degraded /dev/loop0 /mnt/btr # btrfs dev delete /dev/loop3 /mnt/btr ## wait 100s # umount /mnt/btrfs [ 624.764677] BTRFS info (device loop3): disk deleted /dev/loop3 [ 723.596939] BUG: unable to handle kernel NULL pointer dereference at 0000005c [ 723.597019] IP: [<c06b2de8>] bio_get_nr_vecs+0x8/0x40 [ 723.597019] *pde = 00000000 [ 723.597019] Oops: 0000 [#1] SMP Full dmesg could be captured this time, this is what's attached.