Bug 205135

Summary: System hang up when memory swapping (kswapd deadlock)
Product: Memory Management Reporter: GoodMirek (goodmirek)
Component: Page AllocatorAssignee: Andrew Morton (akpm)
Status: RESOLVED OBSOLETE    
Severity: normal CC: goodmirek, haydn.reysenbach, mricon, samoht0-bugzilla, taz.007
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 5.4.0 and 5.3 Subsystem:
Regression: No Bisected commit-id:
Attachments: kernel log 5.4.0-0.rc1.git1.1.fc32.x86_64

Description GoodMirek 2019-10-09 12:50:17 UTC
Created attachment 285419 [details]
kernel log 5.4.0-0.rc1.git1.1.fc32.x86_64

1. Please describe the problem:
I run Fedora 31 with KDE Plasma DE.
Swapping with 5.2.17 works fine, even when using 4GB of swap for several days. Swapping with 5.3.2, 5.3.5 or 5.4.0rc1 causes a complete freeze, usually soon after swapping occurs, i.e. screen freezes up, no mouse movement, no TTY access. I did not try SYSRQ keys. While the issue is happening, the disk LED indicates no disk activity, although the LED works otherwise.

System under test:
HP Elitebook 850 G4
CPU: Intel i5-7200U with embedded GPU
RAM: 4GB unbuffered, memtest OK
Disk: SSD Samsung PM961 (256GB), LVM+LUKS, root volume with XFS filesystem
Swap: swapping to file of size 20GB at path /swapfile
I/O Scheduler: BFQ

2. What is the Version-Release number of the kernel:
5.4.0-0.rc1.git1.1.fc32.x86_64

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :
It works with 5.2.17

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:
- enable swap
- allocate enough RAM so system starts swapping, use e.g. multiple web browser tabs with a memory consuming webpages
- after while, usually in less than two hours, the system freezes

6. Are you running any modules that not shipped with directly Fedora's kernel?:
No.

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.


I see following warning in dmesg:

<truncated>
Oct 09 13:47:08 kernel: WARNING: possible circular locking dependency detected
Oct 09 13:47:08 kernel: 5.4.0-0.rc1.git1.1.fc32.x86_64 #1 Not tainted

<truncated>

*** DEADLOCK ***
Oct 09 13:47:08 kernel: 4 locks held by kswapd0/157:
Oct 09 13:47:08 kernel:  #0: ffffffff83781540 (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x5/0x30
Oct 09 13:47:08 kernel:  #1: ffffffff837743d8 (shrinker_rwsem){++++}, at: shrink_slab+0x134/0x2b0
Oct 09 13:47:08 kernel:  #2: ffff8e2e0dd920e8 (&type->s_umount_key#56){++++}, at: trylock_super+0x16/0x50
Oct 09 13:47:08 kernel:  #3: ffff8e2e0dd57a58 (&pag->pag_ici_reclaim_lock){+.+.}, at: xfs_reclaim_inodes_ag+0x95/0x450 [xfs]

Full dmesg is attached.
Comment 1 GoodMirek 2019-10-10 08:49:48 UTC
Only known recovery from the freeze is a hard reset. This is not a temporary freeze, but system hang up.
Comment 2 GoodMirek 2019-10-11 07:41:20 UTC
kernel-5.4.0-0.rc2.git1.1.fc32.x86_64 is also affected.
Comment 7 GoodMirek 2019-10-22 09:02:22 UTC
Everyone who uses a swapfile on XFS filesystem seem affected by this hang up. Not sure about other filesystems, I did not have a chance to test it elsewhere.

This unreproduced bot crash could be related:
https://lore.kernel.org/linux-mm/20190910071804.2944-1-hdanton@sina.com/
Comment 8 Andrew Morton 2019-10-22 22:24:23 UTC
(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Tue, 22 Oct 2019 09:02:22 +0000 bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=205135
> 
> --- Comment #7 from goodmirek@goodmirek.com ---
> Everyone who uses a swapfile on XFS filesystem seem affected by this hang up.
> Not sure about other filesystems, I did not have a chance to test it
> elsewhere.
> 
> This unreproduced bot crash could be related:
> https://lore.kernel.org/linux-mm/20190910071804.2944-1-hdanton@sina.com/

Thanks.  Might be core MM, might be XFS, might be Fedora.

Hilf, does your patch look related?  That seems to have gone quiet?

Should we progress Tetsuo's patch?
Comment 9 darrick.wong 2019-10-23 01:22:39 UTC
On Tue, Oct 22, 2019 at 03:24:22PM -0700, Andrew Morton wrote:
> 
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
> 
> On Tue, 22 Oct 2019 09:02:22 +0000 bugzilla-daemon@bugzilla.kernel.org wrote:
> 
> > https://bugzilla.kernel.org/show_bug.cgi?id=205135
> > 
> > --- Comment #7 from goodmirek@goodmirek.com ---
> > Everyone who uses a swapfile on XFS filesystem seem affected by this hang
> up.
> > Not sure about other filesystems, I did not have a chance to test it
> elsewhere.
> > 
> > This unreproduced bot crash could be related:
> > https://lore.kernel.org/linux-mm/20190910071804.2944-1-hdanton@sina.com/
> 
> Thanks.  Might be core MM, might be XFS, might be Fedora.
> 
> Hilf, does your patch look related?  That seems to have gone quiet?
> 
> Should we progress Tetsuo's patch?

Hmm...

Oct 09 15:44:52 kernel: Linux version 5.4.0-0.rc1.git1.1.fc32.x86_64 (mockbuild@bkernel03.phx2.fedoraproject.org) (gcc version 9.2.1 20190827 (Red Hat 9.2.1-1) (GCC)) #1 SMP Fri Oct 4 14:57:23 UTC 2019

...istr 5.4-rc1 had some writeback bugs in it...

                        -> #1 (fs_reclaim){+.+.}:
Oct 09 13:47:08 kernel:        fs_reclaim_acquire.part.0+0x25/0x30
Oct 09 13:47:08 kernel:        __kmalloc+0x4f/0x330
Oct 09 13:47:08 kernel:        kmem_alloc+0x83/0x1a0 [xfs]
Oct 09 13:47:08 kernel:        kmem_alloc_large+0x3c/0x100 [xfs]
Oct 09 13:47:08 kernel:        xfs_attr_copy_value+0x5d/0xa0 [xfs]
Oct 09 13:47:08 kernel:        xfs_attr_get+0xe7/0x1d0 [xfs]
Oct 09 13:47:08 kernel:        xfs_get_acl+0xad/0x1e0 [xfs]
Oct 09 13:47:08 kernel:        get_acl+0x81/0x110
Oct 09 13:47:08 kernel:        posix_acl_create+0x58/0x160
Oct 09 13:47:08 kernel:        xfs_generic_create+0x7e/0x2f0 [xfs]
Oct 09 13:47:08 kernel:        lookup_open+0x5bd/0x820
Oct 09 13:47:08 kernel:        path_openat+0x340/0xcb0
Oct 09 13:47:08 kernel:        do_filp_open+0x91/0x100
Oct 09 13:47:08 kernel:        do_sys_open+0x184/0x220
Oct 09 13:47:08 kernel:        do_syscall_64+0x5c/0xa0
Oct 09 13:47:08 kernel:        entry_SYSCALL_64_after_hwframe+0x49/0xbe

That's XFS trying to allocate memory to load an acl off disk, only it
looks this thread does a MAYFAIL allocation.  It's a GFP_FS (since we
don't set KM_NOFS) allocation so we recurse into fs reclaim, and the
ACL-getter has locked the inode (which is probably why lockdep
triggers).  I wonder if that's really a deadlock vs. just super-slow
behavior, but otoh I don't think we're supposed to allow reclaim to jump
into the filesystems when the fs has locks held.

That kmem_alloc_large should probably be changed to KM_NOFS.  Dave?

--D
Comment 10 Su Yue 2019-10-23 02:37:40 UTC
Just to remind, running xfstests/generic/273 could trigger the lockdep
deadlock warning.

--
Su

On 2019/10/23 9:22 AM, Darrick J. Wong wrote:
> On Tue, Oct 22, 2019 at 03:24:22PM -0700, Andrew Morton wrote:
>>
>> (switched to email.  Please respond via emailed reply-to-all, not via the
>> bugzilla web interface).
>>
>> On Tue, 22 Oct 2019 09:02:22 +0000 bugzilla-daemon@bugzilla.kernel.org
>> wrote:
>>
>>> https://bugzilla.kernel.org/show_bug.cgi?id=205135
>>>
>>> --- Comment #7 from goodmirek@goodmirek.com ---
>>> Everyone who uses a swapfile on XFS filesystem seem affected by this hang
>>> up.
>>> Not sure about other filesystems, I did not have a chance to test it
>>> elsewhere.
>>>
>>> This unreproduced bot crash could be related:
>>> https://lore.kernel.org/linux-mm/20190910071804.2944-1-hdanton@sina.com/
>>
>> Thanks.  Might be core MM, might be XFS, might be Fedora.
>>
>> Hilf, does your patch look related?  That seems to have gone quiet?
>>
>> Should we progress Tetsuo's patch?
>
> Hmm...
>
> Oct 09 15:44:52 kernel: Linux version 5.4.0-0.rc1.git1.1.fc32.x86_64
> (mockbuild@bkernel03.phx2.fedoraproject.org) (gcc version 9.2.1 20190827 (Red
> Hat 9.2.1-1) (GCC)) #1 SMP Fri Oct 4 14:57:23 UTC 2019
>
> ...istr 5.4-rc1 had some writeback bugs in it...
>
>                          -> #1 (fs_reclaim){+.+.}:
> Oct 09 13:47:08 kernel:        fs_reclaim_acquire.part.0+0x25/0x30
> Oct 09 13:47:08 kernel:        __kmalloc+0x4f/0x330
> Oct 09 13:47:08 kernel:        kmem_alloc+0x83/0x1a0 [xfs]
> Oct 09 13:47:08 kernel:        kmem_alloc_large+0x3c/0x100 [xfs]
> Oct 09 13:47:08 kernel:        xfs_attr_copy_value+0x5d/0xa0 [xfs]
> Oct 09 13:47:08 kernel:        xfs_attr_get+0xe7/0x1d0 [xfs]
> Oct 09 13:47:08 kernel:        xfs_get_acl+0xad/0x1e0 [xfs]
> Oct 09 13:47:08 kernel:        get_acl+0x81/0x110
> Oct 09 13:47:08 kernel:        posix_acl_create+0x58/0x160
> Oct 09 13:47:08 kernel:        xfs_generic_create+0x7e/0x2f0 [xfs]
> Oct 09 13:47:08 kernel:        lookup_open+0x5bd/0x820
> Oct 09 13:47:08 kernel:        path_openat+0x340/0xcb0
> Oct 09 13:47:08 kernel:        do_filp_open+0x91/0x100
> Oct 09 13:47:08 kernel:        do_sys_open+0x184/0x220
> Oct 09 13:47:08 kernel:        do_syscall_64+0x5c/0xa0
> Oct 09 13:47:08 kernel:        entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> That's XFS trying to allocate memory to load an acl off disk, only it
> looks this thread does a MAYFAIL allocation.  It's a GFP_FS (since we
> don't set KM_NOFS) allocation so we recurse into fs reclaim, and the
> ACL-getter has locked the inode (which is probably why lockdep
> triggers).  I wonder if that's really a deadlock vs. just super-slow
> behavior, but otoh I don't think we're supposed to allow reclaim to jump
> into the filesystems when the fs has locks held.
>
> That kmem_alloc_large should probably be changed to KM_NOFS.  Dave?
>
> --D
>
Comment 11 Dave Chinner 2019-10-23 06:49:13 UTC
On Tue, Oct 22, 2019 at 06:22:28PM -0700, Darrick J. Wong wrote:
> On Tue, Oct 22, 2019 at 03:24:22PM -0700, Andrew Morton wrote:
> > 
> > (switched to email.  Please respond via emailed reply-to-all, not via the
> > bugzilla web interface).
> > 
> > On Tue, 22 Oct 2019 09:02:22 +0000 bugzilla-daemon@bugzilla.kernel.org
> wrote:
> > 
> > > https://bugzilla.kernel.org/show_bug.cgi?id=205135
> > > 
> > > --- Comment #7 from goodmirek@goodmirek.com ---
> > > Everyone who uses a swapfile on XFS filesystem seem affected by this hang
> up.
> > > Not sure about other filesystems, I did not have a chance to test it
> elsewhere.
> > > 
> > > This unreproduced bot crash could be related:
> > > https://lore.kernel.org/linux-mm/20190910071804.2944-1-hdanton@sina.com/
> > 
> > Thanks.  Might be core MM, might be XFS, might be Fedora.
> > 
> > Hilf, does your patch look related?  That seems to have gone quiet?
> > 
> > Should we progress Tetsuo's patch?
> 
> Hmm...
> 
> Oct 09 15:44:52 kernel: Linux version 5.4.0-0.rc1.git1.1.fc32.x86_64
> (mockbuild@bkernel03.phx2.fedoraproject.org) (gcc version 9.2.1 20190827 (Red
> Hat 9.2.1-1) (GCC)) #1 SMP Fri Oct 4 14:57:23 UTC 2019
> 
> ...istr 5.4-rc1 had some writeback bugs in it...
> 
>                         -> #1 (fs_reclaim){+.+.}:
> Oct 09 13:47:08 kernel:        fs_reclaim_acquire.part.0+0x25/0x30
> Oct 09 13:47:08 kernel:        __kmalloc+0x4f/0x330
> Oct 09 13:47:08 kernel:        kmem_alloc+0x83/0x1a0 [xfs]
> Oct 09 13:47:08 kernel:        kmem_alloc_large+0x3c/0x100 [xfs]
> Oct 09 13:47:08 kernel:        xfs_attr_copy_value+0x5d/0xa0 [xfs]
> Oct 09 13:47:08 kernel:        xfs_attr_get+0xe7/0x1d0 [xfs]
> Oct 09 13:47:08 kernel:        xfs_get_acl+0xad/0x1e0 [xfs]
> Oct 09 13:47:08 kernel:        get_acl+0x81/0x110
> Oct 09 13:47:08 kernel:        posix_acl_create+0x58/0x160
> Oct 09 13:47:08 kernel:        xfs_generic_create+0x7e/0x2f0 [xfs]
> Oct 09 13:47:08 kernel:        lookup_open+0x5bd/0x820
> Oct 09 13:47:08 kernel:        path_openat+0x340/0xcb0
> Oct 09 13:47:08 kernel:        do_filp_open+0x91/0x100
> Oct 09 13:47:08 kernel:        do_sys_open+0x184/0x220
> Oct 09 13:47:08 kernel:        do_syscall_64+0x5c/0xa0
> Oct 09 13:47:08 kernel:        entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> That's XFS trying to allocate memory to load an acl off disk, only it
> looks this thread does a MAYFAIL allocation.  It's a GFP_FS (since we
> don't set KM_NOFS) allocation so we recurse into fs reclaim, and the
> ACL-getter has locked the inode (which is probably why lockdep
> triggers).  I wonder if that's really a deadlock vs. just super-slow
> behavior, but otoh I don't think we're supposed to allow reclaim to jump
> into the filesystems when the fs has locks held.
> 
> That kmem_alloc_large should probably be changed to KM_NOFS.  Dave?

I suspect it's a false positive, but without the rest of the lockdep
trace I don't have any context to determine if there is actually a
deadlock vector there.

i.e. the locked inode is referenced and we are not in a transaction
context, so the only reclaim recursion that could attempt to lock it
is dirty page writeback off the LRU from kswapd. i.e. direct
reclaim will never see that inode, nor can I see how it would block
on it. e.g. it's no different from doing memory allocation for BMBT
metadata blocks with the XFS_ILOCK held when reading in the extent
list on a data read or FIEMAP call.

Cheers,

Dave.
Comment 12 Dave Chinner 2019-10-23 07:12:22 UTC
On Wed, Oct 23, 2019 at 05:49:05PM +1100, Dave Chinner wrote:
> On Tue, Oct 22, 2019 at 06:22:28PM -0700, Darrick J. Wong wrote:
> > On Tue, Oct 22, 2019 at 03:24:22PM -0700, Andrew Morton wrote:
> > > 
> > > (switched to email.  Please respond via emailed reply-to-all, not via the
> > > bugzilla web interface).
> > > 
> > > On Tue, 22 Oct 2019 09:02:22 +0000 bugzilla-daemon@bugzilla.kernel.org
> wrote:
> > > 
> > > > https://bugzilla.kernel.org/show_bug.cgi?id=205135
> > > > 
> > > > --- Comment #7 from goodmirek@goodmirek.com ---
> > > > Everyone who uses a swapfile on XFS filesystem seem affected by this
> hang up.
> > > > Not sure about other filesystems, I did not have a chance to test it
> elsewhere.
> > > > 
> > > > This unreproduced bot crash could be related:
> > > >
> https://lore.kernel.org/linux-mm/20190910071804.2944-1-hdanton@sina.com/
> > > 
> > > Thanks.  Might be core MM, might be XFS, might be Fedora.
> > > 
> > > Hilf, does your patch look related?  That seems to have gone quiet?
> > > 
> > > Should we progress Tetsuo's patch?
> > 
> > Hmm...
> > 
> > Oct 09 15:44:52 kernel: Linux version 5.4.0-0.rc1.git1.1.fc32.x86_64
> (mockbuild@bkernel03.phx2.fedoraproject.org) (gcc version 9.2.1 20190827 (Red
> Hat 9.2.1-1) (GCC)) #1 SMP Fri Oct 4 14:57:23 UTC 2019
> > 
> > ...istr 5.4-rc1 had some writeback bugs in it...
> > 
> >                         -> #1 (fs_reclaim){+.+.}:
> > Oct 09 13:47:08 kernel:        fs_reclaim_acquire.part.0+0x25/0x30
> > Oct 09 13:47:08 kernel:        __kmalloc+0x4f/0x330
> > Oct 09 13:47:08 kernel:        kmem_alloc+0x83/0x1a0 [xfs]
> > Oct 09 13:47:08 kernel:        kmem_alloc_large+0x3c/0x100 [xfs]
> > Oct 09 13:47:08 kernel:        xfs_attr_copy_value+0x5d/0xa0 [xfs]
> > Oct 09 13:47:08 kernel:        xfs_attr_get+0xe7/0x1d0 [xfs]
> > Oct 09 13:47:08 kernel:        xfs_get_acl+0xad/0x1e0 [xfs]
> > Oct 09 13:47:08 kernel:        get_acl+0x81/0x110
> > Oct 09 13:47:08 kernel:        posix_acl_create+0x58/0x160
> > Oct 09 13:47:08 kernel:        xfs_generic_create+0x7e/0x2f0 [xfs]
> > Oct 09 13:47:08 kernel:        lookup_open+0x5bd/0x820
> > Oct 09 13:47:08 kernel:        path_openat+0x340/0xcb0
> > Oct 09 13:47:08 kernel:        do_filp_open+0x91/0x100
> > Oct 09 13:47:08 kernel:        do_sys_open+0x184/0x220
> > Oct 09 13:47:08 kernel:        do_syscall_64+0x5c/0xa0
> > Oct 09 13:47:08 kernel:        entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > 
> > That's XFS trying to allocate memory to load an acl off disk, only it
> > looks this thread does a MAYFAIL allocation.  It's a GFP_FS (since we
> > don't set KM_NOFS) allocation so we recurse into fs reclaim, and the
> > ACL-getter has locked the inode (which is probably why lockdep
> > triggers).  I wonder if that's really a deadlock vs. just super-slow
> > behavior, but otoh I don't think we're supposed to allow reclaim to jump
> > into the filesystems when the fs has locks held.
> > 
> > That kmem_alloc_large should probably be changed to KM_NOFS.  Dave?
> 
> I suspect it's a false positive, but without the rest of the lockdep
> trace I don't have any context to determine if there is actually a
> deadlock vector there.

Ok, I've looked at the bz now, and the rest of the trace is kswapd
locking an inode from the superblock shrinker. That means I'm pretty
certain this is a false positive and has nothing to do with whatever
hang is occuring on the user's machine.

These:

Oct 09 14:00:18 kernel: DMA-API: cacheline tracking ENOMEM, dma-debug disabled

occur when a radix_tree_insert() call fails, but I don't see a
radix_tree_preload() call anywhere around that code to ensure the
radix tree insert has memory available before locks are taken and
the insert is attempted. Ahhhh:

static RADIX_TREE(dma_active_cacheline, GFP_NOWAIT);

Seems like that is guaranteed to fail under mempry pressure as it
won't allow memory reclaim to block waiting for progress to be made.

Hence I see nothing in the bug to back up the assertions that
"Everyone who uses a swapfile on XFS filesystem seem affected by
this hang up." There's no evidence at all that even points the
subsystem that has hung. sysrq-w, sysrq-l and sysrq-t output are the
first things we need from that machine to see if/where it is
actually hung...

Cheers,

Dave.
Comment 13 GoodMirek 2020-04-07 05:36:20 UTC
The issue does not happen with kernel 5.5.13.