Bug 204563 - GPF RIP: 0010:z3fold_zpool_map+0x50/0xf0
Summary: GPF RIP: 0010:z3fold_zpool_map+0x50/0xf0
Status: NEW
Alias: None
Product: Memory Management
Classification: Unclassified
Component: Page Allocator (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Andrew Morton
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-08-13 03:48 UTC by Chris Murphy
Modified: 2019-11-06 16:09 UTC (History)
5 users (show)

See Also:
Kernel Version: 5.2.8-200.fc30.x86_64
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg (656.38 KB, text/plain)
2019-08-13 03:49 UTC, Chris Murphy
Details
dmesg2 (2.31 MB, text/plain)
2019-08-13 04:07 UTC, Chris Murphy
Details
proposed patch (2.70 KB, patch)
2019-08-19 15:25 UTC, Vitaly
Details | Diff
proposed patch (1.17 KB, patch)
2019-08-20 15:40 UTC, Vitaly
Details | Diff
5.3.0-0.rc5 z3fold, no patch (4.42 MB, text/plain)
2019-08-22 02:38 UTC, Chris Murphy
Details
proposed patch (new) (1.54 KB, patch)
2019-08-27 05:56 UTC, Vitaly
Details | Diff
console-1566937070.992386586.log v5.3.0-rc6+z3fold5.diff (9.00 KB, text/plain)
2019-08-28 09:18 UTC, Markus Linnala
Details
console-1566935017.712375433.log v5.3.0-rc6+z3fold5.diff (33.23 KB, text/plain)
2019-08-28 09:18 UTC, Markus Linnala
Details
console-1566964471.551922729.log v5.3.0-rc6+z3fold5.diff (8.72 KB, text/plain)
2019-08-28 09:19 UTC, Markus Linnala
Details
dmesg-boot-5.3.0-rc6-z3fold5+-1566964471.551922729.txt (48.57 KB, text/plain)
2019-08-28 09:19 UTC, Markus Linnala
Details
symbols_5.3.0_rc6_z3fold5.txt (19.09 KB, text/plain)
2019-08-28 09:19 UTC, Markus Linnala
Details
console-1566979205.492526522.log v5.3.0-rc6 (11.57 KB, text/plain)
2019-08-28 09:22 UTC, Markus Linnala
Details
dmesg-boot-5.3.0-rc6-1566979205.492526522.txt (47.06 KB, text/plain)
2019-08-28 09:22 UTC, Markus Linnala
Details
symbols_5.3.0-rc6.txt (11.02 KB, text/plain)
2019-08-28 09:22 UTC, Markus Linnala
Details
symbols_5.3.0-rc6.txt fixed (10.97 KB, text/plain)
2019-08-28 11:08 UTC, Markus Linnala
Details
patch3 journal (3.54 MB, text/plain)
2019-08-28 16:26 UTC, Chris Murphy
Details
Proposed patch (verified) (4.86 KB, patch)
2019-09-08 11:05 UTC, Vitaly
Details | Diff

Description Chris Murphy 2019-08-13 03:48:58 UTC
MacbookPro 8,2 (2011)
Intel i7-2820QM 4/8 core, 8GiB RAM, 8GiB swap on SSD plain partition
Samsung SSD 840 EVO 250GB

Command line includes: zswap.enabled=1 zswap.compressor=lz4 zswap.max_pool_percent=20 zswap.zpool=z3fold

1. Build webkit per the top section, this amounts to running the first two commands and the problem occurs while `ninja` is executing the build
https://trac.webkit.org/wiki/BuildingGtk

About mid way through there's heavy swapping, and then the general protection fault happens. Doesn't happen when not using zswap. I haven't yet tested zbud instead of z3fold.
Comment 1 Chris Murphy 2019-08-13 03:49:56 UTC
Created attachment 284347 [details]
dmesg
Comment 2 Chris Murphy 2019-08-13 04:07:21 UTC
Created attachment 284349 [details]
dmesg2

Same thing with 5.3.0-0.rc3.git0.1.fc31.x86_64+debug

During heavy swap period, before GPF, I issued sysrq+t for additional information on state.
Comment 3 Chris Murphy 2019-08-13 05:38:40 UTC
Retest with zbud and this problem doesn't happen.
Comment 4 Vitaly 2019-08-19 15:25:57 UTC
Created attachment 284507 [details]
proposed patch

Would it be possible for you to try applying the patch and retesting? Thanks!
Comment 5 Chris Murphy 2019-08-19 19:23:30 UTC
Hi Vitay,
Yes, perhaps later this week. In the meantime, a question:

This documentation:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/Documentation/vm/zswap.rst?h=v5.2.9

says "zswap is a work in progress and should be considered experimental". Is that still an accurate assessment, and you would not recommend a distribution enabling it by default with out of the box installations? The alternative the Fedora Workstation working group is exploring is two swaps: one with higher priority on a ZRAM device, and a second conventional swap on disk partition with lower priority. The use case sounds ideal for zswap and its less complicated, but if zswap is experimental still, I think it's a problem. Relates to this downstream issue:
https://pagure.io/fedora-workstation/issue/98
Comment 6 Vitaly 2019-08-20 15:38:23 UTC
Hi Chris,

I believe zswap is stable enough to be used in a product. So is ZRAM which is used in virtually all recent enough Android devices. Doing the work I do, I mostly work with ZRAM and not zswap, so this z3fold update exhibited bugs that were only reproducible with zswap. That said, I'm sure we'll get to a good level of stability before end of summer.
Comment 7 Vitaly 2019-08-20 15:40:29 UTC
Created attachment 284533 [details]
proposed patch
Comment 8 Chris Murphy 2019-08-20 20:04:43 UTC
Fedora kernel team did a scratch build with the first patch on top of 5.3.0rc5, which I just tested on the same system with the same example test. This failed very quickly, less than 5 minutes. The entire system froze up: GUI, mouse pointer, remote ssh connection used to monitor the test froze and I was eventually disconnected. It did not recover after 30 minutes in this state, so I forced power off.

Unfortunately the systemd journal only writes out the possible culprit but not why, the call trace never made it to the log:

[    0.000000] fmac.local kernel: Linux version 5.3.0-0.rc5.git0.1.zswapfix.fc32.x86_64 (mockbuild@buildvm-21.phx2.fedoraproject.org) (gcc version 9.1.1 20190605 (Red Hat 9.1.1-2) (GCC)) #1 SMP Mon Aug 19 21:58:44 UTC 2019
[    0.000000] fmac.local kernel: Command line: BOOT_IMAGE=(hd5,gpt4)/vmlinuz-5.3.0-0.rc5.git0.1.zswapfix.fc32.x86_64 root=UUID=72df6d5b-26d1-47ff-a9ab-33f6a0b2c4cf ro rootflags=subvol=root log_buf_len=4M systemd.debug-shell=1 printk.devkmsg=on zswap.enabled=1 zswap.compressor=lz4 zswap.max_pool_percent=20 zswap.zpool=z3fold
[snip]
[    1.573007] fmac.local kernel: zswap: loaded using pool lz4/z3fold
[snip]
[   14.135506] fmac.local kernel: IPv6: ADDRCONF(NETDEV_CHANGE): enp2s0f0: link becomes ready
[  125.274048] fmac.local kernel: Bluetooth: RFCOMM TTY layer initialized
[  125.274056] fmac.local kernel: Bluetooth: RFCOMM socket layer initialized
[  125.274143] fmac.local kernel: Bluetooth: RFCOMM ver 1.11
[  125.505707] fmac.local kernel: rfkill: input handler disabled
[  563.626998] fmac.local kernel: ------------[ cut here ]------------
[  563.627015] fmac.local kernel: WARNING: CPU: 5 PID: 176 at mm/z3fold.c:426 handle_to_buddy.cold+0xc/0x13
[  563.627016] fmac.local kernel: Modules linked in: rfcomm xt_CHECKSUM xt_MASQUERADE nf_nat_tftp nf_conntrack_tftp tun bridge stp llc nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_REJECT nf_reject_ipv6 ip6t_rpfilter ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables bnep intel_rapl_msr intel_rapl_common b43 cordic sunrpc mac80211 x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel snd_hda_codec_cirrus kvm snd_hda_codec_generic ledtrig_audio snd_hda_intel irqbypass cfg80211 snd_hda_codec uvcvideo btusb nls_utf8 btrtl btbcm videobuf2_vmalloc hfsplus btintel videobuf2_memops crct10dif_pclmul snd_hda_core videobuf2_v4l2 crc32_pclmul snd_hwdep ssb iTCO_wdt bluetooth ghash_clmulni_intel libarc4 mei_hdcp joydev iTCO_vendor_support snd_seq

That is the last entry that successfully made it to the disk.
Comment 9 Chris Murphy 2019-08-22 02:17:47 UTC
Patch 2 does not seem to help on top of rc5. The last items in the log before complete hang, including ssh session:

Aug 21 19:09:23 fmac.local kernel: DMA-API: cacheline tracking ENOMEM, dma-debug disabled
Aug 21 19:09:23 fmac.local kernel: DMA-API: cacheline tracking ENOMEM, dma-debug disabled
Aug 21 19:09:23 fmac.local kernel: DMA-API: cacheline tracking ENOMEM, dma-debug disabled
Aug 21 19:09:23 fmac.local kernel: DMA-API: cacheline tracking ENOMEM, dma-debug disabled
Aug 21 19:09:23 fmac.local kernel: DMA-API: cacheline tracking ENOMEM, dma-debug disabled
Aug 21 19:09:23 fmac.local kernel: DMA-API: cacheline tracking ENOMEM, dma-debug disabled
Aug 21 19:10:33 fmac.local kernel: cc1plus (5788) used greatest stack depth: 10768 bytes left
Comment 10 Chris Murphy 2019-08-22 02:38:58 UTC
Created attachment 284563 [details]
5.3.0-0.rc5 z3fold, no patch

Decided to redo the test with unpatched 5.3.0-0.rc5.git0.1.fc31.x86_64 (unpatched) and still using z3fold, with the same webkit build torture test. System goes through periods of unresponsiveness (same as zbud, no zswap), eventually a bunch of call traces, and then becomes responsive again but the system is very unstable and 'reboot -f' as root did not reboot the system. But I have a full kernel messages for this, including sysrq+t just as the system first starts becoming unresponsive; followed by a bunch of zswap related call traces; and another sysrq+t a few minutes before 'reboot -f'
Comment 11 Markus Linnala 2019-08-22 13:03:01 UTC
I've been debugging this issue.

First you need to setup virtual machine with serial console. As this issue happens almost always swap is stuck. And to trigger the issue you need to create memory pressuse, you end up with all kinds of services in swap. Most services are then unreliable (they are stuck). One of the services are your ssh, shell and systemd-journald. Usually there is no way to get relevant logs when situation has happended. Machine does not also reliably shut down. So serial console is easiest way to get logs.

I've configured console to be a file. But I need to monitor file as sometimes kernel goes in call trace loop and fills file really fast and it is rotated and so on. And anyways it is pointless to continue to test after first GPF.

I use "virsh destroy" to stop testing machine. It is only reliable way to stop it from running indefinitely.

I have used zswap for some time, but commits starting  7c2b8baa61fe578af905342938ad12f8dbaeae79 broke it (from git bisect).

I currently suggest you do not use zswap+z3fold in production for now if your kernel is 5.2 or newer, until this bug is fixed. It is really easy to trigger issues when you have memory pressure and programs are starting, allocating memory for a while and then stopping.

To disable zswap in running machine:

sudo bash -c 'echo 0 > /sys/module/zswap/parameters/enabled'

Fedora bug: https://bugzilla.redhat.com/show_bug.cgi?id=1740690
Comment 12 Vitaly 2019-08-27 05:56:17 UTC
Created attachment 284619 [details]
proposed patch (new)

The issue seems to be related to improper handling of retries in z3fold_reclaim_page(). Chris, would you please try the new patch?
Comment 13 Markus Linnala 2019-08-28 09:16:52 UTC
I have testing setup to test zswap issues.

Current testing is run in two Fedora30 virtual machines with 4 vCPU about 1GiB memory and 2GiB swap. Host is Fedora 30 with Intel(R) Core(TM) i5-4570 CPU @ 3.20GHz.

Script is equivalent of to run this in one connection:

for tmout in 10 10 10 10 10 10 20 20 20 30 30 900 3600; do
stress --vm $(($(nproc)+2)) --vm-bytes $(($(awk '/MemAvail/{print $2}' /proc/meminfo)*1024/$(nproc))) --timeout "$tmout"
done

and run this in another connection in loop::

bash -c '
declare -A arr;
b=();
for a in $(seq 7000);do
    b+=($a);
    arr["$a"]="${b[@]}";
done;
sleep 60;
'

Generally stress provides constant memory pressure and bash script provides stable memory pressure and then some time to move data into swap and then free data from swap when bash dies.

Testing run tooks about 1h 18 minutes or until any issues are noticed in serial console log.



To have proper comparision, here is first results against v5.3.0-rc6 unapplied.

I get almost all of these or some corruptions with this kind of issue. Currently all have happened within minute starting virtual machine. I guess this masks some other issues.

[   22.570220] general protection fault: 0000 [#1] SMP PTI
[   22.571490] CPU: 2 PID: 151 Comm: kswapd0 Tainted: G        W         5.3.0-rc6 #82
[   22.573037] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-2.fc30 04/01/2014
[   22.574711] RIP: 0010:handle_to_buddy+0x20/0x30
[   22.575888] Code: 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 53 48 89 fb 83 e7 01 0f 85 01 28 00 00 48 8b 03 5b 48 89 c2 48 81 e2 00 f0 ff ff <0
f> b6 92 ca 00 00 00 29 d0 83 e0 03 c3 0f 1f 00 0f 1f 44 00 00 41
[   22.579681] RSP: 0000:ffffb25540197820 EFLAGS: 00010206
[   22.580986] RAX: 00ffff8aa0c872f0 RBX: ffffec3dc021cbc0 RCX: 0000000000000000
[   22.585186] RDX: 00ffff8aa0c87000 RSI: ffff8aa0fe9d89c8 RDI: ffff8aa0fe9d89c8
[   22.586636] RBP: ffff8aa0c872f000 R08: ffff8aa0fe9d89c8 R09: 0000000000000000
[   22.588886] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8aa0c872f001
[   22.590327] R13: ffff8aa0c872f010 R14: ffff8aa0fd3c8000 R15: ffffb25540197868
[   22.591804] FS:  0000000000000000(0000) GS:ffff8aa0fe800000(0000) knlGS:0000000000000000
[   22.593361] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   22.594630] CR2: 00007f04cd125010 CR3: 000000000b212002 CR4: 0000000000160ee0
[   22.596126] Call Trace:
[   22.597007]  z3fold_zpool_map+0x76/0x110
[   22.598021]  zswap_writeback_entry+0x50/0x410
[   22.599206]  z3fold_zpool_shrink+0x3c4/0x540
[   22.600291]  zswap_frontswap_store+0x424/0x7c1
[   22.601352]  __frontswap_store+0xc4/0x162
[   22.602369]  swap_writepage+0x39/0x70
[   22.603380]  pageout.isra.0+0x12c/0x5d0
[   22.604395]  shrink_page_list+0x1124/0x1830
[   22.605417]  shrink_inactive_list+0x1da/0x460
[   22.606462]  shrink_node_memcg+0x202/0x770
[   22.607525]  shrink_node+0xdf/0x490
[   22.608457]  balance_pgdat+0x2db/0x580
[   22.609420]  kswapd+0x239/0x500
[   22.610299]  ? finish_wait+0x90/0x90
[   22.611263]  kthread+0x108/0x140
[   22.612191]  ? balance_pgdat+0x580/0x580
[   22.613212]  ? kthread_park+0x80/0x80
[   22.614213]  ret_from_fork+0x3a/0x50
[   22.615151] Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set nfnetlink ip6table_filter ip6_tables iptable_filter ip_tables crct10dif_pclmul crc32_pclmul ghash_clmulni_intel virtio_net net_failover virtio_balloon failover intel_agp intel_gtt qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel virtio_console serio_raw virtio_blk agpgart qemu_fw_cfg
[   22.624384] ---[ end trace a16fd6fc7f1f14b7 ]---




I've now ran my testing setup against latest patch z3fold5.diff applied against v5.3.0-rc6 about 12 hours and so far I've about 130 instances where there is kernel problems (and 33 where there was none).

Here is instances I guess are related to this issue. About 20% of runs did not trigger any issues. 

First instance is about 40% of total. console-1566937070.992386586.log

[  222.260461] BUG: unable to handle page fault for address: fffff1881c000028
[  222.264200] #PF: supervisor read access in kernel mode
[  222.264916] #PF: error_code(0x0000) - not-present page
[  222.265518] PGD 0 P4D 0 
[  222.265827] Oops: 0000 [#1] SMP PTI
[  222.266332] CPU: 3 PID: 150 Comm: kswapd0 Not tainted 5.3.0-rc6-z3fold5+ #81
[  222.268656] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-2.fc30 04/01/2014
[  222.270666] RIP: 0010:z3fold_zpool_map+0x52/0x110
[  222.275030] Code: e8 48 01 ea 0f 82 ca 00 00 00 48 c7 c3 00 00 00 80 48 2b 1d a8 e0 e4 00 48 01 d3 48 c1 eb 0c 48 c1 e3 06 48 03 1d 86 e0 e4 00 <48> 8b 53 28 83 e2 01 74 07 5b 5d 41 5c 41 5d c3 4c 8d 6d 10 4c 89
[  222.278274] RSP: 0000:ffffb7f64019f828 EFLAGS: 00010286
[  222.279018] RAX: 0000000000000000 RBX: fffff1881c000000 RCX: 0000000000000000
[  222.281214] RDX: 0000000080000000 RSI: ffff8fb906d7d348 RDI: ffff8fb935ec3e00
[  222.282627] RBP: 0000000000000000 R08: ffff8fb935ec3e00 R09: 0000000000000000
[  222.284076] R10: 0000000000000002 R11: 0000000000000000 R12: ffff8fb906d7d348
[  222.290517] R13: ffff8fb935ec3ea0 R14: ffff8fb935ec3e08 R15: ffffb7f64019f868
[  222.293142] FS:  0000000000000000(0000) GS:ffff8fb93ea00000(0000) knlGS:0000000000000000
[  222.294341] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  222.295141] CR2: fffff1881c000028 CR3: 0000000039122001 CR4: 0000000000160ee0
[  222.296423] Call Trace:
[  222.297022]  zswap_writeback_entry+0x50/0x410
[  222.297784]  z3fold_zpool_shrink+0x299/0x540
[  222.298574]  zswap_frontswap_store+0x424/0x7c1
[  222.300394]  __frontswap_store+0xc4/0x162
[  222.301310]  swap_writepage+0x39/0x70
[  222.302637]  pageout.isra.0+0x12c/0x5d0
[  222.303813]  shrink_page_list+0x1124/0x1830
[  222.305109]  shrink_inactive_list+0x1da/0x460
[  222.305872]  shrink_node_memcg+0x202/0x770
[  222.310596]  shrink_node+0xdf/0x490
[  222.311023]  balance_pgdat+0x2db/0x580
[  222.311623]  kswapd+0x239/0x500
[  222.312068]  ? finish_wait+0x90/0x90
[  222.312530]  kthread+0x108/0x140
[  222.312929]  ? balance_pgdat+0x580/0x580
[  222.313411]  ? kthread_park+0x80/0x80
[  222.313903]  ret_from_fork+0x3a/0x50
[  222.314360] Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set nfnetlink ip6table_filter ip6_tables iptable_filter ip_tables crct10dif_pclmul crc32_pclmul ghash_clmulni_intel virtio_balloon virtio_net intel_agp net_failover failover intel_gtt qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel serio_raw virtio_console virtio_blk agpgart qemu_fw_cfg
[  222.327514] CR2: fffff1881c000028
[  222.329786] ---[ end trace a109466dffbff13d ]---


This instance is about 30%. console-1566935017.712375433.log 

[  682.535670] BUG: unable to handle page fault for address: ffffd27f4c000028
[  682.537141] #PF: supervisor read access in kernel mode
[  682.538251] #PF: error_code(0x0000) - not-present page
[  682.539543] PGD 0 P4D 0 
[  682.540126] Oops: 0000 [#1] SMP PTI
[  682.540900] CPU: 3 PID: 1547 Comm: stress Not tainted 5.3.0-rc6-z3fold5+ #81
[  682.548470] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-2.fc30 04/01/2014
[  682.552506] RIP: 0010:z3fold_zpool_map+0x52/0x110
[  682.553799] Code: e8 48 01 ea 0f 82 ca 00 00 00 48 c7 c3 00 00 00 80 48 2b 1d a8 e0 e4 00 48 01 d3 48 c1 eb 0c 48 c1 e3 06 48 03 1d 86 e0 e4 00 <48> 8b 53 28 83 e2 01 74 07 5b 5d 41 5c 41 5d c3 4c 8d 6d 10 4c 89
[  682.562259] RSP: 0000:ffffa689006cb3f8 EFLAGS: 00010286
[  682.563680] RAX: 0000000000000000 RBX: ffffd27f4c000000 RCX: 0000000000000000
[  682.568269] RDX: 0000000080000000 RSI: ffff980d098a9658 RDI: ffff980d3346ee00
[  682.571681] RBP: 0000000000000000 R08: ffff980d3346ee00 R09: 0000000000000000
[  682.573631] R10: 0000000000000003 R11: 0000000000000000 R12: ffff980d098a9658
[  682.577567] R13: ffff980d3346eea0 R14: ffff980d3346ee08 R15: ffffa689006cb438
[  682.579231] FS:  00007f08a76b7740(0000) GS:ffff980d3ea00000(0000) knlGS:0000000000000000
[  682.583664] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  682.589120] CR2: ffffd27f4c000028 CR3: 000000001577a002 CR4: 0000000000160ee0
[  682.590708] Call Trace:
[  682.591233]  zswap_writeback_entry+0x50/0x410
[  682.592665]  z3fold_zpool_shrink+0x4a7/0x540
[  682.593796]  zswap_frontswap_store+0x424/0x7c1
[  682.594883]  __frontswap_store+0xc4/0x162
[  682.596394]  swap_writepage+0x39/0x70
[  682.598313]  pageout.isra.0+0x12c/0x5d0
[  682.599308]  shrink_page_list+0x1124/0x1830
[  682.601946]  shrink_inactive_list+0x1da/0x460
[  682.603244]  shrink_node_memcg+0x202/0x770
[  682.604275]  ? sched_clock_cpu+0xc/0xc0
[  682.607826]  shrink_node+0xdf/0x490
[  682.608563]  do_try_to_free_pages+0xdb/0x3c0
[  682.609460]  try_to_free_pages+0x112/0x2e0
[  682.610394]  __alloc_pages_slowpath+0x422/0x1000
[  682.615474]  ? __lock_acquire+0x247/0x1900
[  682.616335]  __alloc_pages_nodemask+0x37f/0x400
[  682.617352]  alloc_pages_vma+0xcc/0x170
[  682.618158]  __read_swap_cache_async+0x1e9/0x3e0
[  682.619120]  swap_cluster_readahead+0x184/0x330
[  682.620113]  ? find_held_lock+0x32/0x90
[  682.621510]  swapin_readahead+0x2b4/0x4e0
[  682.622901]  ? sched_clock_cpu+0xc/0xc0
[  682.624315]  do_swap_page+0x3ac/0xc20
[  682.630673]  __handle_mm_fault+0x8da/0x1900
[  682.632099]  handle_mm_fault+0x159/0x340
[  682.634693]  do_user_addr_fault+0x1fe/0x480
[  682.640755]  do_page_fault+0x31/0x210
[  682.642182]  page_fault+0x3e/0x50
[  682.643473] RIP: 0033:0x555bbbc05298
[  682.644762] Code: 7e 01 00 00 89 df e8 47 e1 ff ff 44 8b 2d 84 4d 00 00 4d 85 ff 7e 40 31 c0 eb 0f 0f 1f 80 00 00 00 00 4c 01 f0 49 39 c7 7e 2d <80> 7c 05 00 5a 4c 8d 54 05 00 74 ec 4c 89 14 24 45 85 ed 0f 89 de
[  682.652781] RSP: 002b:00007ffd7bb940f0 EFLAGS: 00010206
[  682.657114] RAX: 0000000003045000 RBX: ffffffffffffffff RCX: 00007f08a77b0156
[  682.659673] RDX: 0000000000000000 RSI: 000000000ae28000 RDI: 0000000000000000
[  682.663818] RBP: 00007f089c88f010 R08: 00007f089c88f010 R09: 0000000000000000
[  682.670702] R10: 00007f089f8d3010 R11: 0000000000000246 R12: 0000555bbbc07004
[  682.672793] R13: 0000000000000002 R14: 0000000000001000 R15: 000000000ae27800
[  682.674792] Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set nfnetlink ip6table_filter ip6_tables iptable_filter ip_tables crct10dif_pclmul crc32_pclmul ghash_clmulni_intel virtio_net virtio_balloon net_failover intel_agp failover intel_gtt qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel serio_raw virtio_blk virtio_console agpgart qemu_fw_cfg
[  682.702373] CR2: ffffd27f4c000028
[  682.704259] ---[ end trace 732fd3aa7ff23a1f ]---


And third issue once: console-1566964471.551922729.log

[  180.825545] general protection fault: 0000 [#1] SMP PTI
[  180.826414] CPU: 2 PID: 1416 Comm: stress Not tainted 5.3.0-rc6-z3fold5+ #81
[  180.827267] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-2.fc30 04/01/2014
[  180.828293] RIP: 0010:z3fold_zpool_map+0x52/0x110
[  180.828863] Code: e8 48 01 ea 0f 82 ca 00 00 00 48 c7 c3 00 00 00 80 48 2b 1d a8 e0 e4 00 48 01 d3 48 c1 eb 0c 48 c1 e3 06 48 03 1d 86 e0 e4 00 <48> 8b 53 28 83 e2 01 74 07 5b 5d 41 5c 41 5d c3 4c 8d 6d 10 4c 89
[  180.831767] RSP: 0000:ffffba55009b73f8 EFLAGS: 00010203
[  180.832866] RAX: 093e12473dc06000 RBX: 0024e01c70f70180 RCX: 0000000000000000
[  180.834964] RDX: 093e1247bdc06000 RSI: ffff8f2b33859740 RDI: ffff8f2b392ad400
[  180.836998] RBP: 093e12473dc06000 R08: ffff8f2b392ad400 R09: 0000000000000000
[  180.838923] R10: 0000000000000003 R11: 0000000000000000 R12: ffff8f2b33859740
[  180.840437] R13: ffff8f2b392ad4a0 R14: ffff8f2b392ad408 R15: ffffba55009b7438
[  180.842929] FS:  00007f3cfae70740(0000) GS:ffff8f2b3a800000(0000) knlGS:0000000000000000
[  180.845797] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  180.847412] CR2: 00007f3cf6657010 CR3: 000000002bf58005 CR4: 0000000000160ee0
[  180.849257] Call Trace:
[  180.849781]  zswap_writeback_entry+0x50/0x410
[  180.850897]  z3fold_zpool_shrink+0x4a7/0x540
[  180.852029]  zswap_frontswap_store+0x424/0x7c1
[  180.852960]  __frontswap_store+0xc4/0x162
[  180.853795]  swap_writepage+0x39/0x70
[  180.854636]  pageout.isra.0+0x12c/0x5d0
[  180.855424]  shrink_page_list+0x1124/0x1830
[  180.856361]  shrink_inactive_list+0x1da/0x460
[  180.857257]  ? lruvec_lru_size+0x50/0x130
[  180.858524]  shrink_node_memcg+0x202/0x770
[  180.859772]  ? sched_clock_cpu+0xc/0xc0
[  180.860660]  shrink_node+0xdf/0x490
[  180.861385]  do_try_to_free_pages+0xdb/0x3c0
[  180.862343]  try_to_free_pages+0x112/0x2e0
[  180.863316]  __alloc_pages_slowpath+0x422/0x1000
[  180.864685]  ? __lock_acquire+0x247/0x1900
[  180.865752]  __alloc_pages_nodemask+0x37f/0x400
[  180.866682]  alloc_pages_vma+0xcc/0x170
[  180.867611]  __read_swap_cache_async+0x1e9/0x3e0
[  180.868924]  swap_cluster_readahead+0x184/0x330
[  180.869998]  ? find_held_lock+0x32/0x90
[  180.871064]  swapin_readahead+0x2b4/0x4e0
[  180.872281]  ? sched_clock_cpu+0xc/0xc0
[  180.873287]  do_swap_page+0x3ac/0xc20
[  180.874365]  __handle_mm_fault+0x8da/0x1900
[  180.875611]  handle_mm_fault+0x159/0x340
[  180.876420]  do_user_addr_fault+0x1fe/0x480
[  180.877584]  do_page_fault+0x31/0x210
[  180.879220]  page_fault+0x3e/0x50
[  180.880716] RIP: 0033:0x55e1f4116298
[  180.882034] Code: 7e 01 00 00 89 df e8 47 e1 ff ff 44 8b 2d 84 4d 00 00 4d 85 ff 7e 40 31 c0 eb 0f 0f 1f 80 00 00 00 00 4c 01 f0 49 39 c7 7e 2d <80> 7c 05 00 5a 4c 8d 54 05 00 74 ec 4c 89 14 24 45 85 ed 0f 89 de
[  180.902338] RSP: 002b:00007ffe1e2b1520 EFLAGS: 00010206
[  180.907398] RAX: 0000000004f6b000 RBX: ffffffffffffffff RCX: 00007f3cfaf69156
[  180.910820] RDX: 0000000000000000 RSI: 000000000a75d000 RDI: 0000000000000000
[  180.913132] RBP: 00007f3cf0713010 R08: 00007f3cf0713010 R09: 0000000000000000
[  180.920294] R10: 00007f3cf567d010 R11: 0000000000000246 R12: 000055e1f4118004
[  180.927246] R13: 0000000000000002 R14: 0000000000001000 R15: 000000000a75c800
[  180.933896] Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set nfnetlink ip6table_filter ip6_tables iptable_filter ip_tables crct10dif_pclmul crc32_pclmul ghash_clmulni_intel virtio_net virtio_balloon net_failover failover intel_agp intel_gtt qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm crc32c_intel serio_raw drm virtio_blk virtio_console qemu_fw_cfg agpgart
[  180.966032] ---[ end trace 7509e00655c549dc ]---


Compared to v5.3.0-rc6 (about 85 issues per hour) and v5.3.0-rc6+z3fold5.diff (about 10.5 issues per hour) there is only about 1/8th of issues.


There is more issues, but their signatures do not match this bug. This big issue happens usually before the smaller issues and masks them.
Comment 14 Markus Linnala 2019-08-28 09:18:12 UTC
Created attachment 284647 [details]
console-1566937070.992386586.log v5.3.0-rc6+z3fold5.diff
Comment 15 Markus Linnala 2019-08-28 09:18:46 UTC
Created attachment 284649 [details]
console-1566935017.712375433.log v5.3.0-rc6+z3fold5.diff
Comment 16 Markus Linnala 2019-08-28 09:19:10 UTC
Created attachment 284651 [details]
console-1566964471.551922729.log v5.3.0-rc6+z3fold5.diff
Comment 17 Markus Linnala 2019-08-28 09:19:30 UTC
Created attachment 284653 [details]
dmesg-boot-5.3.0-rc6-z3fold5+-1566964471.551922729.txt
Comment 18 Markus Linnala 2019-08-28 09:19:44 UTC
Created attachment 284655 [details]
symbols_5.3.0_rc6_z3fold5.txt
Comment 19 Markus Linnala 2019-08-28 09:22:13 UTC
Created attachment 284657 [details]
console-1566979205.492526522.log v5.3.0-rc6
Comment 20 Markus Linnala 2019-08-28 09:22:35 UTC
Created attachment 284659 [details]
dmesg-boot-5.3.0-rc6-1566979205.492526522.txt
Comment 21 Markus Linnala 2019-08-28 09:22:56 UTC
Created attachment 284661 [details]
symbols_5.3.0-rc6.txt
Comment 22 Markus Linnala 2019-08-28 11:08:40 UTC
Created attachment 284663 [details]
symbols_5.3.0-rc6.txt fixed

Had wrong symbols.
Comment 23 Chris Murphy 2019-08-28 16:26:27 UTC
Created attachment 284665 [details]
patch3 journal

(In reply to Vitaly from comment #12)
> The issue seems to be related to improper handling of retries in
> z3fold_reclaim_page(). Chris, would you please try the new patch?

I decided to keep everything exactly the same as with patch2 (same system, same test, same kernel) just to help keep the non-determinism semi-sane. I reverted patch2, applied this patch3.

Test begins somewhere between 75s and 396s monotonic time. There is a call trace at 615s that I can't tell if it's related to any of this. The system starts becoming intermittently unresponsive in the GUI (ssh is OK), so I issued sysrq+t at 807s. Another possibly unrelated call trace at 864s. And then an oops at 920s which does look zswap related. I took another sysrq+t at 1076s. And then killed off the test.

Complete kernel messages attached.
Comment 24 Vitaly 2019-09-08 11:05:39 UTC
Created attachment 284883 [details]
Proposed patch (verified)

This one works for me in a replica of Markus's setup.

Note You need to log in before you can comment on or make changes to this bug.