209543 – paging to thin-provisioned Logical Volume triggers workqueue lockup

Bug 209543 - paging to thin-provisioned Logical Volume triggers workqueue lockup

Summary: paging to thin-provisioned Logical Volume triggers workqueue lockup

Status:	NEW

Alias:	None

Product:	Memory Management
Classification:	Unclassified
Component:	Other (show other bugs)
Hardware:	All Linux

Importance:	P1 high
Assignee:	Andrew Morton

URL:
Keywords:

Depends on:
Blocks:

Reported:	2020-10-06 12:10 UTC by Ulrich.Windl
Modified:	2021-10-07 13:54 UTC (History)
CC List:	0 users

See Also:
Kernel Version:	5.3.18
Subsystem:
Regression:	No
Bisected commit-id:

Attachments
Console screen shot when system had locked up for more than an hour (398.58 KB, image/png) 2020-10-06 12:10 UTC, Ulrich.Windl	Details
Contents of top window when system locked up (3.92 KB, text/plain) 2020-10-06 12:14 UTC, Ulrich.Windl	Details
Konsole screenshot when system locked up (taken after display had frozen for about 4 minutes) (253.51 KB, image/png) 2020-12-16 11:08 UTC, Ulrich.Windl	Details
Add an attachment (proposed patch, testcase, etc.)

Description Ulrich.Windl 2020-10-06 12:10:27 UTC

Created attachment 292851 [details]
Console screen shot when system had locked up for more than an hour

I have a reproducible scenario where a machine with 4 sockets (Intel Xeon Gold 6240), 4*18 cores (4*36 threads) completely locks up.

The effect is triggered shortly after the system starts swapping, and the necessary condition to trigger the effect is using a thin-provisioned LVM LV for swapping. That means LVM has to allocate new blocks for the swap while swapping (out).

The program that triggers the effect allocates 100GB RAM (the machine has 768GB RAM), the forks off 8 processes that each shift that block of memory up and down by a few bytes.  Effectively this causes the once shared memory to be unshared (CoW), forcing the system into swapping.  Once a significant amount of swap is being used, the system locks up.

The swap configured on the system uses two devices: A 5GB partition at priority 2 and a 750GB LV at priority 4. The partition is on a RAID1 device with two SSDs, and the PV of the LV is using RAID6 with five SSDs.  RAID controller is a PERC H730P in a Dell PowerEdge R840.

The OS being used is SLES15 SP2 using kernel 5.3.18-24.15-default and LVM 2.03.05.

When the system locks up, all SSH sessions die mysteriously (PuTTY doe snot detect that case), I cannot start any new SSH sessions, not could I switch virtual consoles or log in there. However Ping still receives a response.
As I was expecting a lockup, I had a top running with 0.2s sleep interval, and that top also stopped refreshing (of course).

At the moment of lockup I see four kswapd[0-3], a kworker/u288:1+dm-thin, a kworker/43:1-mm_percpu_wq and several more kworkers, ksoftirqds and one migration process. There is plenty of swap space available (less than 7GB used), 15 processes running, 1429 sleeping and a load of 6.

Comment 1 Ulrich.Windl 2020-10-06 12:14:57 UTC

Created attachment 292853 [details]
Contents of top window when system locked up

Here is the output of top when the system locked up ("mkmload" is my test program).

Comment 2 Ulrich.Windl 2020-10-07 13:40:28 UTC

An additional note: When redoing the test, I pressed ^C on the remote SSH terminal once the top runing in another SSH terminal stopped refreshing. I left the machine "running". It seems it recovered about 8 hours after that, but all syslog messages created during that time were missing...

Comment 3 Ulrich.Windl 2020-12-16 11:08:30 UTC

Created attachment 294153 [details]
Konsole screenshot when system locked up (taken after display had frozen for about 4 minutes)

An update: I could reproduce the problem on a different machine (Dell PowerEdge R7415) with less RAM (256G) and fewer CPUs (4-node NUMA AMD EPYC 7401P 24 Core CPU) and with a newer kernel (5.3.18-24.43-default of SLES15 SP2).
It seems to me that the problem occurred right at the moment when the first page had to be paged out.

Comment 4 Ulrich.Windl 2021-10-07 13:53:25 UTC

After having another lockup (in 5.3.18-24.70 provided by SUSE) that was not caused by memory pressure, but still involved "swapper", I wondered whether the problem might originate from IRQs not being re-enabled wonder specific conditions.

Comment 5 Ulrich.Windl 2021-10-07 13:54:37 UTC

Sorry (In reply to Ulrich.Windl from comment #4)
> the problem might originate from IRQs not being re-enabled wonder specific

s/wonder/under/

Note You need to log in before you can comment on or make changes to this bug.