Bug 204377

Summary: Deadlock with btrfs-transacti
Product: File System Reporter: Drazen Kacar (drazen.kacar)
Component: btrfsAssignee: BTRFS virtual assignee (fs_btrfs)
Status: NEW ---    
Severity: normal CC: stf_xl
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 5.2.4 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg

Description Drazen Kacar 2019-07-30 13:05:21 UTC
Created attachment 284045 [details]
dmesg

Virtual machine in VmWare, 4 CPUs, 8 GB RAM.

# cat /etc/redhat-release 
CentOS Linux release 7.6.1810 (Core)

# uname -a
Linux prod-dbsnap-01 5.2.4-1.el7.elrepo.x86_64 #1 SMP Sun Jul 28 08:10:57 EDT 2019 x86_64 x86_64 x86_64 GNU/Linux

# btrfs --version
btrfs-progs v5.2.1 

# btrfs qgroup show /data/pg_data
ERROR: can't list qgroups: quotas not enabled

# grep btrfs /proc/mounts 
/dev/sdb /data/pg_data btrfs rw,noatime,compress-force=zstd:3,ssd,space_cache,subvolid=5,subvol=/ 0 0

# btrfs filesystem usage /data/pg_data
Overall:
    Device size:                   2.00TiB
    Device allocated:            872.02GiB
    Device unallocated:            1.15TiB
    Device missing:                  0.00B
    Used:                        843.15GiB
    Free (estimated):              1.17TiB      (min: 1.17TiB)
    Data ratio:                       1.00
    Metadata ratio:                   1.00
    Global reserve:              512.00MiB      (used: 0.00B)

Data,RAID0: Size:860.00GiB, Used:834.69GiB
   /dev/sdb      215.00GiB
   /dev/sdc      215.00GiB
   /dev/sdd      215.00GiB
   /dev/sde      215.00GiB

Metadata,RAID0: Size:12.00GiB, Used:8.46GiB
   /dev/sdb        3.00GiB
   /dev/sdc        3.00GiB
   /dev/sdd        3.00GiB
   /dev/sde        3.00GiB

System,RAID0: Size:16.00MiB, Used:64.00KiB
   /dev/sdb        4.00MiB
   /dev/sdc        4.00MiB
   /dev/sdd        4.00MiB
   /dev/sde        4.00MiB

Unallocated:
   /dev/sdb      294.00GiB
   /dev/sdc      294.00GiB
   /dev/sdd      294.00GiB
   /dev/sde      294.00GiB

Got deadlock between btrfs-transacti and postgres processes.

# mpstat -P ALL 2
Linux 5.2.4-1.el7.elrepo.x86_64 (prod-dbsnap-01)        07/30/2019      _x86_64_        (4 CPU)

02:55:16 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
02:55:18 PM  all    0.25    0.00    0.25   25.06    0.00    0.00    0.00    0.00    0.00   74.43
02:55:18 PM    0    0.51    0.00    0.51    0.00    0.00    0.00    0.00    0.00    0.00   98.99
02:55:18 PM    1    0.00    0.00    0.51    0.00    0.00    0.00    0.00    0.00    0.00   99.49
02:55:18 PM    2    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
02:55:18 PM    3    0.00    0.00    0.00  100.00    0.00    0.00    0.00    0.00    0.00    0.00

02:55:18 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
02:55:20 PM  all    0.00    0.00    0.25   24.75    0.00    0.00    0.00    0.00    0.00   75.00
02:55:20 PM    0    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
02:55:20 PM    1    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
02:55:20 PM    2    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
02:55:20 PM    3    0.51    0.00    0.51   98.99    0.00    0.00    0.00    0.00    0.00    0.00

...

Average:     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
Average:     all    0.13    0.00    0.25   24.92    0.00    0.00    0.00    0.00    0.00   74.70
Average:       0    0.22    0.00    0.36    0.00    0.00    0.00    0.00    0.00    0.00   99.42
Average:       1    0.07    0.00    0.14    0.00    0.00    0.00    0.00    0.00    0.00   99.78
Average:       2    0.07    0.00    0.07    0.00    0.00    0.00    0.00    0.00    0.00   99.86
Average:       3    0.14    0.00    0.29   99.57    0.00    0.00    0.00    0.00    0.00    0.00

# ps -p 1411 -o pid,state,wchan:30,comm
   PID S WCHAN                          COMMAND
  1411 D lock_extent_buffer_for_io      btrfs-transacti

There's nothing running on CPUs. I'm attaching dmesg with a bunch of info obtained from various /proc/sysrq-trigger commands.