Bug 192571 - zswap + zram enabled BUG
Summary: zswap + zram enabled BUG
Status: NEW
Alias: None
Product: Memory Management
Classification: Unclassified
Component: Page Allocator (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: Andrew Morton
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-01-14 17:31 UTC by Gluzskiy Alexandr
Modified: 2021-03-03 14:28 UTC (History)
8 users (show)

See Also:
Kernel Version: 4.9.2
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Gluzskiy Alexandr 2017-01-14 17:31:42 UTC

    
Comment 1 Gluzskiy Alexandr 2017-01-14 17:32:04 UTC
[199961.576604] ------------[ cut here ]------------
[199961.577830] kernel BUG at mm/zswap.c:1108!
[199961.579006] invalid opcode: 0000 [#1] SMP
[199961.580166] Modules linked in: uvcvideo gspca_zc3xx xt_sctp zram ccm act_mirred ifb sch_ingress cls_u32 sch_sfq sch_htb nf_conntrack_netlink nfnetlink sit tunnel4 ip_tunnel iptable_mangle ipt_REJECT nf_reject_ipv4 xt_recent xt_TCPMSS nf_conntrack_ipv6 nf_defrag_ipv6 iptable_filter ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_conntrack xt_nat xt_tcpudp xt_multiport ip6table_filter ip6table_raw ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_raw ip_tables x_tables radeon ath9k led_class ath9k_common i2c_algo_bit btrfs ath9k_hw ttm mac80211 drm_kms_helper xor snd_hda_codec_via snd_hda_codec_generic cfbfillrect ath syscopyarea cfbimgblt cfg80211 sysfillrect sysimgblt fb_sys_fops cfbcopyarea rfkill drm snd_hda_intel snd_hda_codec r8169 xhci_pci xhci_hcd backlight
[199961.587843]  parport_pc ohci_pci raid6_pq snd_hda_core mii button ohci_hcd asus_atk0110 i2c_piix4 acpi_cpufreq processor sch_fq_codel br_netfilter bridge stp llc snd_usb_audio snd_hwdep snd_usbmidi_lib snd_pcm snd_rawmidi snd_seq_device snd_timer snd soundcore vhost_net tun nfsd vhost macvtap auth_rpcgss macvlan oid_registry nfs_acl lockd grace kvm_amd kvm irqbypass gspca_main v4l2_common k10temp hwmon videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videodev videobuf2_core i2c_core parport fbcon bitblit softcursor fb fbdev font sunrpc autofs4 [last unloaded: uvcvideo]
[199961.594974] CPU: 2 PID: 2755 Comm: syncthing Not tainted 4.9.2 #4
[199961.596459] Hardware name: System manufacturer System Product Name/M4A77TD, BIOS 2104    06/28/2010
[199961.597974] task: ffff880035c19680 task.stack: ffffc90000510000
[199961.599490] RIP: 0010:[<ffffffff8112c6c2>]  [<ffffffff8112c6c2>] zswap_frontswap_load+0x142/0x160
[199961.601042] RSP: 0000:ffffc90000513cb0  EFLAGS: 00010282
[199961.602588] RAX: ffffffff818263a0 RBX: ffff88036b2fb930 RCX: ffffc90000513c98
[199961.604141] RDX: ffff8801a75da000 RSI: ffff8802ee9d0240 RDI: ffff88041c25d000
[199961.605687] RBP: ffff880035c19680 R08: ffff8802ee9d0249 R09: ffff8801a75da0ac
[199961.607240] R10: ffff8801a75db000 R11: ffff8802ee9d027d R12: 00000000ffffffea
[199961.608788] R13: ffff8804176e3830 R14: ffff8804176e3838 R15: 000000c42a6ac008
[199961.610315] FS:  00007fe1fa7fc700(0000) GS:ffff88042fc80000(0000) knlGS:0000000000000000
[199961.611838] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[199961.613357] CR2: 000000c42a6ac008 CR3: 00000000a8411000 CR4: 00000000000006e0
[199961.614864] Stack:
[199961.616327]  00001000fffffffe ffffffff81823780 000000000014487b ffffea00069d7680
[199961.617820]  0000000000000001 ffff880418614c00 ffffffff8112b768 ffffea00069d7680
[199961.619310]  ffff880418614c00 000000000014487b 00000000024200ca ffff88011d834900
[199961.620799] Call Trace:
[199961.622250]  [<ffffffff8112b768>] ? __frontswap_load+0x68/0xc0
[199961.623689]  [<ffffffff8112666c>] ? swap_readpage+0x8c/0x120
[199961.625115]  [<ffffffff81126e61>] ? read_swap_cache_async+0x21/0x40
[199961.626545]  [<ffffffff81126f96>] ? swapin_readahead+0x116/0x1e0
[199961.627973]  [<ffffffff812b704e>] ? radix_tree_lookup_slot+0xe/0x20
[199961.629398]  [<ffffffff8111236f>] ? do_swap_page+0x42f/0x660
[199961.630799]  [<ffffffff81114bca>] ? handle_mm_fault+0x76a/0x1080
[199961.632163]  [<ffffffff811544ec>] ? new_sync_read+0xac/0xe0
[199961.633496]  [<ffffffff8102c7a9>] ? __do_page_fault+0x169/0x3e0
[199961.634798]  [<ffffffff8102ca5b>] ? do_page_fault+0x1b/0x60
[199961.636106]  [<ffffffff810e3cc9>] ? __context_tracking_exit.part.1+0x49/0x60
[199961.637424]  [<ffffffff8157c7cf>] ? page_fault+0x1f/0x30
[199961.638739] Code: fb ff ff 41 c6 45 08 00 48 83 c4 08 44 89 e0 5b 5d 41 5c 41 5d 41 5e c3 be 0f 00 00 00 48 c7 c7 12 d6 6d 81 e8 e0 d2 f1 ff eb b0 <0f> 0b 0f 1f 84 00 00 00 00 00 66 2e 0f 1f 84 00 00 00 00 00 66 
[199961.641538] RIP  [<ffffffff8112c6c2>] zswap_frontswap_load+0x142/0x160
[199961.642922]  RSP <ffffc90000513cb0>
[199961.648971] ---[ end trace 76742a0cd4818a78 ]---
Comment 2 Andrew Morton 2017-01-17 20:22:55 UTC
(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Sat, 14 Jan 2017 17:32:04 +0000 bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=192571
> 
> --- Comment #1 from Gluzskiy Alexandr <sss123next@list.ru> ---
> [199961.576604] ------------[ cut here ]------------
> [199961.577830] kernel BUG at mm/zswap.c:1108!
> [199961.579006] invalid opcode: 0000 [#1] SMP
> [199961.580166] Modules linked in: uvcvideo gspca_zc3xx xt_sctp zram ccm
> act_mirred ifb sch_ingress cls_u32 sch_sfq sch_htb nf_conntrack_netlink
> nfnetlink sit tunnel4 ip_tunnel iptable_mangle ipt_REJECT nf_reject_ipv4
> xt_recent xt_TCPMSS nf_conntrack_ipv6 nf_defrag_ipv6 iptable_filter
> ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_conntrack xt_nat xt_tcpudp
> xt_multiport ip6table_filter ip6table_raw ip6_tables iptable_nat
> nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_raw
> ip_tables x_tables radeon ath9k led_class ath9k_common i2c_algo_bit btrfs
> ath9k_hw ttm mac80211 drm_kms_helper xor snd_hda_codec_via
> snd_hda_codec_generic cfbfillrect ath syscopyarea cfbimgblt cfg80211
> sysfillrect sysimgblt fb_sys_fops cfbcopyarea rfkill drm snd_hda_intel
> snd_hda_codec r8169 xhci_pci xhci_hcd backlight
> [199961.587843]  parport_pc ohci_pci raid6_pq snd_hda_core mii button
> ohci_hcd
> asus_atk0110 i2c_piix4 acpi_cpufreq processor sch_fq_codel br_netfilter
> bridge
> stp llc snd_usb_audio snd_hwdep snd_usbmidi_lib snd_pcm snd_rawmidi
> snd_seq_device snd_timer snd soundcore vhost_net tun nfsd vhost macvtap
> auth_rpcgss macvlan oid_registry nfs_acl lockd grace kvm_amd kvm irqbypass
> gspca_main v4l2_common k10temp hwmon videobuf2_vmalloc videobuf2_memops
> videobuf2_v4l2 videodev videobuf2_core i2c_core parport fbcon bitblit
> softcursor fb fbdev font sunrpc autofs4 [last unloaded: uvcvideo]
> [199961.594974] CPU: 2 PID: 2755 Comm: syncthing Not tainted 4.9.2 #4
> [199961.596459] Hardware name: System manufacturer System Product
> Name/M4A77TD,
> BIOS 2104    06/28/2010
> [199961.597974] task: ffff880035c19680 task.stack: ffffc90000510000
> [199961.599490] RIP: 0010:[<ffffffff8112c6c2>]  [<ffffffff8112c6c2>]
> zswap_frontswap_load+0x142/0x160
> [199961.601042] RSP: 0000:ffffc90000513cb0  EFLAGS: 00010282
> [199961.602588] RAX: ffffffff818263a0 RBX: ffff88036b2fb930 RCX:
> ffffc90000513c98
> [199961.604141] RDX: ffff8801a75da000 RSI: ffff8802ee9d0240 RDI:
> ffff88041c25d000
> [199961.605687] RBP: ffff880035c19680 R08: ffff8802ee9d0249 R09:
> ffff8801a75da0ac
> [199961.607240] R10: ffff8801a75db000 R11: ffff8802ee9d027d R12:
> 00000000ffffffea
> [199961.608788] R13: ffff8804176e3830 R14: ffff8804176e3838 R15:
> 000000c42a6ac008
> [199961.610315] FS:  00007fe1fa7fc700(0000) GS:ffff88042fc80000(0000)
> knlGS:0000000000000000
> [199961.611838] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [199961.613357] CR2: 000000c42a6ac008 CR3: 00000000a8411000 CR4:
> 00000000000006e0
> [199961.614864] Stack:
> [199961.616327]  00001000fffffffe ffffffff81823780 000000000014487b
> ffffea00069d7680
> [199961.617820]  0000000000000001 ffff880418614c00 ffffffff8112b768
> ffffea00069d7680
> [199961.619310]  ffff880418614c00 000000000014487b 00000000024200ca
> ffff88011d834900
> [199961.620799] Call Trace:
> [199961.622250]  [<ffffffff8112b768>] ? __frontswap_load+0x68/0xc0
> [199961.623689]  [<ffffffff8112666c>] ? swap_readpage+0x8c/0x120
> [199961.625115]  [<ffffffff81126e61>] ? read_swap_cache_async+0x21/0x40
> [199961.626545]  [<ffffffff81126f96>] ? swapin_readahead+0x116/0x1e0
> [199961.627973]  [<ffffffff812b704e>] ? radix_tree_lookup_slot+0xe/0x20
> [199961.629398]  [<ffffffff8111236f>] ? do_swap_page+0x42f/0x660
> [199961.630799]  [<ffffffff81114bca>] ? handle_mm_fault+0x76a/0x1080
> [199961.632163]  [<ffffffff811544ec>] ? new_sync_read+0xac/0xe0
> [199961.633496]  [<ffffffff8102c7a9>] ? __do_page_fault+0x169/0x3e0
> [199961.634798]  [<ffffffff8102ca5b>] ? do_page_fault+0x1b/0x60
> [199961.636106]  [<ffffffff810e3cc9>] ?
> __context_tracking_exit.part.1+0x49/0x60
> [199961.637424]  [<ffffffff8157c7cf>] ? page_fault+0x1f/0x30
> [199961.638739] Code: fb ff ff 41 c6 45 08 00 48 83 c4 08 44 89 e0 5b 5d 41
> 5c
> 41 5d 41 5e c3 be 0f 00 00 00 48 c7 c7 12 d6 6d 81 e8 e0 d2 f1 ff eb b0 <0f>
> 0b
> 0f 1f 84 00 00 00 00 00 66 2e 0f 1f 84 00 00 00 00 00 66 
> [199961.641538] RIP  [<ffffffff8112c6c2>] zswap_frontswap_load+0x142/0x160
> [199961.642922]  RSP <ffffc90000513cb0>
> [199961.648971] ---[ end trace 76742a0cd4818a78 ]---
> 
> -- 
> You are receiving this mail because:
> You are the assignee for the bug.
Comment 3 Gluzskiy Alexandr 2017-01-18 06:07:38 UTC
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

В Ср, 18/01/2017 в 10:39 +0900, Sergey Senozhatsky пишет:
> Cc Dan
> 
> On (01/17/17 12:22), Andrew Morton wrote:
> > > https://bugzilla.kernel.org/show_bug.cgi?id=192571
> > > 
> > > --- Comment #1 from Gluzskiy Alexandr <sss123next@list.ru> ---
> > > [199961.576604] ------------[ cut here ]------------
> > > [199961.577830] kernel BUG at mm/zswap.c:1108!
> 
> zswap didn't manage to decompress the page:
> 
> static int zswap_frontswap_load(unsigned type, pgoff_t offset,
>                               struct page *page)
> {
> ...
>       dst = kmap_atomic(page);
>       tfm = *get_cpu_ptr(entry->pool->tfm);
>       ret = crypto_comp_decompress(tfm, src, entry->length, dst,
> &dlen);
>       put_cpu_ptr(entry->pool->tfm);
>       kunmap_atomic(dst);
>       zpool_unmap_handle(entry->pool->zpool, entry->handle);
>       BUG_ON(ret);
>       ^^^^^^^^^^^
> 
> is there anything suspicious in dmesg?
> 
>       -ss
> 
> [..]
> > > [199961.596459] Hardware name: System manufacturer System Product
> > > Name/M4A77TD,
> > > BIOS 2104    06/28/2010
> > > [199961.597974] task: ffff880035c19680 task.stack:
> > > ffffc90000510000
> > > [199961.599490] RIP:
> > > 0010:[<ffffffff8112c6c2>]  [<ffffffff8112c6c2>]
> > > zswap_frontswap_load+0x142/0x160
> > > [199961.601042] RSP: 0000:ffffc90000513cb0  EFLAGS: 00010282
> > > [199961.602588] RAX: ffffffff818263a0 RBX: ffff88036b2fb930 RCX:
> > > ffffc90000513c98
> > > [199961.604141] RDX: ffff8801a75da000 RSI: ffff8802ee9d0240 RDI:
> > > ffff88041c25d000
> > > [199961.605687] RBP: ffff880035c19680 R08: ffff8802ee9d0249 R09:
> > > ffff8801a75da0ac
> > > [199961.607240] R10: ffff8801a75db000 R11: ffff8802ee9d027d R12:
> > > 00000000ffffffea
> > > [199961.608788] R13: ffff8804176e3830 R14: ffff8804176e3838 R15:
> > > 000000c42a6ac008
> > > [199961.610315] FS:  00007fe1fa7fc700(0000)
> > > GS:ffff88042fc80000(0000)
> > > knlGS:0000000000000000
> > > [199961.611838] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [199961.613357] CR2: 000000c42a6ac008 CR3: 00000000a8411000 CR4:
> > > 00000000000006e0
> > > [199961.614864] Stack:
> > > [199961.616327]  00001000fffffffe ffffffff81823780
> > > 000000000014487b
> > > ffffea00069d7680
> > > [199961.617820]  0000000000000001 ffff880418614c00
> > > ffffffff8112b768
> > > ffffea00069d7680
> > > [199961.619310]  ffff880418614c00 000000000014487b
> > > 00000000024200ca
> > > ffff88011d834900
> > > [199961.620799] Call Trace:
> > > [199961.622250]  [<ffffffff8112b768>] ?
> > > __frontswap_load+0x68/0xc0
> > > [199961.623689]  [<ffffffff8112666c>] ? swap_readpage+0x8c/0x120
> > > [199961.625115]  [<ffffffff81126e61>] ?
> > > read_swap_cache_async+0x21/0x40
> > > [199961.626545]  [<ffffffff81126f96>] ?
> > > swapin_readahead+0x116/0x1e0
> > > [199961.627973]  [<ffffffff812b704e>] ?
> > > radix_tree_lookup_slot+0xe/0x20
> > > [199961.629398]  [<ffffffff8111236f>] ? do_swap_page+0x42f/0x660
> > > [199961.630799]  [<ffffffff81114bca>] ?
> > > handle_mm_fault+0x76a/0x1080
> > > [199961.632163]  [<ffffffff811544ec>] ? new_sync_read+0xac/0xe0
> > > [199961.633496]  [<ffffffff8102c7a9>] ?
> > > __do_page_fault+0x169/0x3e0
> > > [199961.634798]  [<ffffffff8102ca5b>] ? do_page_fault+0x1b/0x60
> > > [199961.636106]  [<ffffffff810e3cc9>] ?
> > > __context_tracking_exit.part.1+0x49/0x60
> > > [199961.637424]  [<ffffffff8157c7cf>] ? page_fault+0x1f/0x30
> > > [199961.638739] Code: fb ff ff 41 c6 45 08 00 48 83 c4 08 44 89
> > > e0 5b 5d 41 5c
> > > 41 5d 41 5e c3 be 0f 00 00 00 48 c7 c7 12 d6 6d 81 e8 e0 d2 f1 ff
> > > eb b0 <0f> 0b
> > > 0f 1f 84 00 00 00 00 00 66 2e 0f 1f 84 00 00 00 00 00 66 
> > > [199961.641538] RIP  [<ffffffff8112c6c2>]
> > > zswap_frontswap_load+0x142/0x160
> > > [199961.642922]  RSP <ffffc90000513cb0>
> > > [199961.648971] ---[ end trace 76742a0cd4818a78 ]---

no, nothing interesting in dmesg. but i suspect what it may be because
of usage zram and zswap together.
i have following configuration:
1. boot option to kernel "ro radeon.audio=0 dma_debug=off reboot=warm
gbpages rootfstype=ext4
rootflags=relatime,user_xattr,journal_async_commit,delalloc,nobarrier
zswap.enabled=1"
2. zram activation in userspace:
"
cat /etc/local.d/zram.start 
#!/bin/sh

modprobe zram
echo 10G > /sys/block/zram0/disksize
mkswap /dev/zram0
swapon --priority 200 /dev/zram0
"

3. also i have normal swap block device as fall back if all memory used
"
swapon
NAME             TYPE      SIZE USED PRIO
/dev/mapper/swap partition  16G   0B    0
/dev/zram0       partition  10G   1G  200
"

so, maybe problem related to zswap on zram ? just a guess...
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCgAdFiEEl/sA7WQg6czXWI/dsEApXVthX7wFAlh/BBIACgkQsEApXVth
X7w9lw/+O55XK/YZHszD/DMKRuZaaAQz7to/JrkJOCOJaYsV/PpUBh6liqYH8LCV
6vYaavzKt3ICW1qRa6Wjj7QC2YZKZTe8i8ERGTamDOnSu/gMlJz3EQ/uOEsNxde5
eoJr9n+JtUqf0PUUaMc61FcRbePcb3csQDD7KAwMSO7Q7+uP/osFUApjFVBOv0yd
KggONcuyIlE0CIhmMk31Id+C7XoKeJogHa2qTIolGzi+yLCmiL+q+CujfXfrbOAz
N6mDr7v6RTwzzOyXULZahceVxVtpUSgj84HG9wxTF7dwN6kwbW/YtdMu7UruqRyb
SYHauUQSuEcbyb5m7tAPWfy4WsWaTacscdBCrOVqYJcn0nb945RMDz0RPIFZmLQS
da6/zh67UF9KuSgprVakvgQ/ITJOfd96USlwZ+E8icJzT36IPWkSmFe6pNEa+KMn
FiUf0JPN6ivO2q2wuwkIEKIeLiqDNX7QwcMxowMHKxezZobrzdyd4LoLx143mAa/
Ls0nABaN9bk+jzl3Ffl2Vx7YowuercwGaRzBuPEdxVQflA1gVPi7o/zwJ75CPAre
ntQk8nWAqpxB30s0/++xYPbYaJFqWtXM2e4AQKQjiZSAdq34yl+q+di/1iGS/u4Q
gfvGaprAtViK6AqURT8dXrWTv8KzAT2prIs3wdpmrc3V92p1cAo=
=5ZmQ
-----END PGP SIGNATURE-----
Comment 4 Michal Hocko 2017-01-18 13:17:20 UTC
On Wed 18-01-17 10:39:48, Sergey Senozhatsky wrote:
> Cc Dan
> 
> On (01/17/17 12:22), Andrew Morton wrote:
> > > https://bugzilla.kernel.org/show_bug.cgi?id=192571
> > > 
> > > --- Comment #1 from Gluzskiy Alexandr <sss123next@list.ru> ---
> > > [199961.576604] ------------[ cut here ]------------
> > > [199961.577830] kernel BUG at mm/zswap.c:1108!
> 
> zswap didn't manage to decompress the page:
> 
> static int zswap_frontswap_load(unsigned type, pgoff_t offset,
>                               struct page *page)
> {
> ...
>       dst = kmap_atomic(page);
>       tfm = *get_cpu_ptr(entry->pool->tfm);
>       ret = crypto_comp_decompress(tfm, src, entry->length, dst, &dlen);
>       put_cpu_ptr(entry->pool->tfm);
>       kunmap_atomic(dst);
>       zpool_unmap_handle(entry->pool->zpool, entry->handle);
>       BUG_ON(ret);
>       ^^^^^^^^^^^

Ugh, why do we even do that? This is not the way how to handle error
situations. AFAIU propagating the error out wouldn't be a big deal
because we would just fallback to regular swap, right?
Comment 5 Dan Streetman 2017-01-19 01:37:16 UTC
On Wed, Jan 18, 2017 at 8:17 AM, Michal Hocko <mhocko@kernel.org> wrote:
> On Wed 18-01-17 10:39:48, Sergey Senozhatsky wrote:
>> Cc Dan
>>
>> On (01/17/17 12:22), Andrew Morton wrote:
>> > > https://bugzilla.kernel.org/show_bug.cgi?id=192571
>> > >
>> > > --- Comment #1 from Gluzskiy Alexandr <sss123next@list.ru> ---
>> > > [199961.576604] ------------[ cut here ]------------
>> > > [199961.577830] kernel BUG at mm/zswap.c:1108!
>>
>> zswap didn't manage to decompress the page:
>>
>> static int zswap_frontswap_load(unsigned type, pgoff_t offset,
>>                               struct page *page)
>> {
>> ...
>>       dst = kmap_atomic(page);
>>       tfm = *get_cpu_ptr(entry->pool->tfm);
>>       ret = crypto_comp_decompress(tfm, src, entry->length, dst, &dlen);
>>       put_cpu_ptr(entry->pool->tfm);
>>       kunmap_atomic(dst);
>>       zpool_unmap_handle(entry->pool->zpool, entry->handle);
>>       BUG_ON(ret);
>>       ^^^^^^^^^^^
>
> Ugh, why do we even do that? This is not the way how to handle error
> situations. AFAIU propagating the error out wouldn't be a big deal
> because we would just fallback to regular swap, right?

yeah this function definitely should never bug; it's just a callback
from the zpool to try to write a page back to the swapcache so the
zpool can free a page.  It's definitely ok for it to return an error.

> --
> Michal Hocko
> SUSE Labs
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
Comment 6 Dan Streetman 2017-01-19 02:57:45 UTC
On Wed, Jan 18, 2017 at 12:58 AM, Alexandr <sss123next@list.ru> wrote:
> no, nothing interesting in dmesg. but i suspect what it may be because
> of usage zram and zswap together.
> i have following configuration:
> 1. boot option to kernel "ro radeon.audio=0 dma_debug=off reboot=warm
> gbpages rootfstype=ext4
> rootflags=relatime,user_xattr,journal_async_commit,delalloc,nobarrier
> zswap.enabled=1"
> 2. zram activation in userspace:
> "
> cat /etc/local.d/zram.start
> #!/bin/sh
>
> modprobe zram
> echo 10G > /sys/block/zram0/disksize
> mkswap /dev/zram0
> swapon --priority 200 /dev/zram0

Why would you do this?  There's no benefit of using zswap together with zram.


> "
>
> 3. also i have normal swap block device as fall back if all memory used
> "
> swapon
> NAME             TYPE      SIZE USED PRIO
> /dev/mapper/swap partition  16G   0B    0
> /dev/zram0       partition  10G   1G  200
> "
>
> so, maybe problem related to zswap on zram ? just a guess...

it think it's unlikely, but it's hard to tell exactly why the page
couldn't be uncompressed; my guess would be more likely a bug in the
zpool backend.  Were you using the default (zbud)?

> -----BEGIN PGP SIGNATURE-----
>
> iQIzBAEBCgAdFiEEl/sA7WQg6czXWI/dsEApXVthX7wFAlh/BBIACgkQsEApXVth
> X7w9lw/+O55XK/YZHszD/DMKRuZaaAQz7to/JrkJOCOJaYsV/PpUBh6liqYH8LCV
> 6vYaavzKt3ICW1qRa6Wjj7QC2YZKZTe8i8ERGTamDOnSu/gMlJz3EQ/uOEsNxde5
> eoJr9n+JtUqf0PUUaMc61FcRbePcb3csQDD7KAwMSO7Q7+uP/osFUApjFVBOv0yd
> KggONcuyIlE0CIhmMk31Id+C7XoKeJogHa2qTIolGzi+yLCmiL+q+CujfXfrbOAz
> N6mDr7v6RTwzzOyXULZahceVxVtpUSgj84HG9wxTF7dwN6kwbW/YtdMu7UruqRyb
> SYHauUQSuEcbyb5m7tAPWfy4WsWaTacscdBCrOVqYJcn0nb945RMDz0RPIFZmLQS
> da6/zh67UF9KuSgprVakvgQ/ITJOfd96USlwZ+E8icJzT36IPWkSmFe6pNEa+KMn
> FiUf0JPN6ivO2q2wuwkIEKIeLiqDNX7QwcMxowMHKxezZobrzdyd4LoLx143mAa/
> Ls0nABaN9bk+jzl3Ffl2Vx7YowuercwGaRzBuPEdxVQflA1gVPi7o/zwJ75CPAre
> ntQk8nWAqpxB30s0/++xYPbYaJFqWtXM2e4AQKQjiZSAdq34yl+q+di/1iGS/u4Q
> gfvGaprAtViK6AqURT8dXrWTv8KzAT2prIs3wdpmrc3V92p1cAo=
> =5ZmQ
> -----END PGP SIGNATURE-----
>
Comment 7 Dan Streetman 2017-01-19 04:43:36 UTC
On Wed, Jan 18, 2017 at 10:00 PM, Sergey Senozhatsky
<sergey.senozhatsky.work@gmail.com> wrote:
> On (01/18/17 20:36), Dan Streetman wrote:
>> On Wed, Jan 18, 2017 at 8:17 AM, Michal Hocko <mhocko@kernel.org> wrote:
>> > On Wed 18-01-17 10:39:48, Sergey Senozhatsky wrote:
>> >> Cc Dan
>> >>
>> >> On (01/17/17 12:22), Andrew Morton wrote:
>> >> > > https://bugzilla.kernel.org/show_bug.cgi?id=192571
>> >> > >
>> >> > > --- Comment #1 from Gluzskiy Alexandr <sss123next@list.ru> ---
>> >> > > [199961.576604] ------------[ cut here ]------------
>> >> > > [199961.577830] kernel BUG at mm/zswap.c:1108!
>> >>
>> >> zswap didn't manage to decompress the page:
>> >>
>> >> static int zswap_frontswap_load(unsigned type, pgoff_t offset,
>> >>                               struct page *page)
>> >> {
>> >> ...
>> >>       dst = kmap_atomic(page);
>> >>       tfm = *get_cpu_ptr(entry->pool->tfm);
>> >>       ret = crypto_comp_decompress(tfm, src, entry->length, dst, &dlen);
>> >>       put_cpu_ptr(entry->pool->tfm);
>> >>       kunmap_atomic(dst);
>> >>       zpool_unmap_handle(entry->pool->zpool, entry->handle);
>> >>       BUG_ON(ret);
>> >>       ^^^^^^^^^^^
>> >
>> > Ugh, why do we even do that? This is not the way how to handle error
>> > situations. AFAIU propagating the error out wouldn't be a big deal
>> > because we would just fallback to regular swap, right?
>>
>> yeah this function definitely should never bug; it's just a callback
>> from the zpool to try to write a page back to the swapcache so the
>> zpool can free a page.  It's definitely ok for it to return an error.
>>
>
> good. Dan, Seth, care to send the patch?

damn, i misread the trace.  I need to stop reading bug emails late at night.

I just sent a patch to change the BUG calls in zswap_writeback_entry()
as those are totally recoverable, but the BUG here is from
zswap_frontswap_load(), and a failure there isn't recoverable.

Unfortunately, if we accepted the page, but now can't recover it, it's
gone and returning error from zswap_frontswap_load() will just cause
page_io.c to go read the swap disk; and what's on there is not the
page we're looking for, it's undefined as we didn't actually ever
write the swap page to disk (unless frontswap_writethrough is
enabled).

Ill have to look at this again in the morning.

> and one more thing... can you take a look at [1]?

yeah, i've been meaning to get to that as well, will send something tomorrow.

>
> [1] https://marc.info/?l=linux-mm&m=147031191906154&w=4
>
>         -ss
>
Comment 8 Gluzskiy Alexandr 2017-01-24 00:03:13 UTC
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512


> Why would you do this?  There's no benefit of using zswap together
> with zram.

i just wanted to test zram and zswap, i still not dig to deep in it,
but what i wanted is to use zram swap (with zswap disabled), and if it
exceeded use real swap on block device with zswap enabled.
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCgAdFiEEl/sA7WQg6czXWI/dsEApXVthX7wFAliGmbkACgkQsEApXVth
X7zyDQ/9HMnJ5JzAAkrWJKlvpA+H6CRw0YBO77zQ44lr9R5jqmVAhU3XS6+dfYpA
ZL9lwG8zEqSDUCakko4vRVaeOiy3qzCNQcect2J1I9aGrHFIkC0I/ifPpbXRa4s5
+D45mSUzGxnMMz1XZrOkvuNsbzdWuTmQqTUqnJVovRD/V62u8Y50gDL3zkz/9x7L
mLjl/5WGjBAOQtwYpq1uE7FAJFHjV2cX8yI5JrFzMK1oghjFfqPFiYbD0yqSR2MB
QFdDQlqhMZ7Dwnk0P/WzIpJXdoT2NXH1iWRsvvKYeMwRP7hIzEnkpxfTlYtBK5xu
7zw/IEa0prLaEtYEh1j6h8Tzn6wKNeIT3t0g2yBT3QC8BW/v7AODlj95C+jIR06f
tikDCx+DUDuP96SW6RIjVLODCt/4yCzgVdxoAD5AbAyY+pU+JEmDkz8L60Gk2mC9
OG9IExiCCY/G3069A6UZROSFrrZGgrP75JGhTP91cS/XGH/HODFmqHQVVp45cED9
wn820IGjB2AAI6MmmRCvgqzUs99PTv8Xqr/x2Ea/ce+lFiU+L5x+xY7Q1q3KhQpQ
pLqLShi9iQUAzIYAtXZNPlbgwDtbYqz5sIAa6cmiv92bcRgJdPf4SiWLBUzIVi0M
KkNXiyo3XkDXoC8P1WjzLoexoJtOJooUbPcKimCI8Ef6+s5PmC0=
=Pyde
-----END PGP SIGNATURE-----
Comment 9 Dan Streetman 2017-01-24 20:17:35 UTC
On Mon, Jan 23, 2017 at 7:03 PM, Alexandr <sss123next@list.ru> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
>
>
>> Why would you do this?  There's no benefit of using zswap together
>> with zram.
>
> i just wanted to test zram and zswap, i still not dig to deep in it,
> but what i wanted is to use zram swap (with zswap disabled), and if it
> exceeded use real swap on block device with zswap enabled.

I don't believe that's possible, you can't enable zswap for only
specific swap devices; and anyway, if you fill up zram, you won't
really have any memory left for zswap to use will you?

However, it shouldn't encounter any BUG(), like you saw.  If it's
reproducable for you, can you give details on how to reproduce it?
Comment 10 Gluzskiy Alexandr 2017-01-25 07:26:14 UTC
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

В Вт, 24/01/2017 в 15:16 -0500, Dan Streetman пишет:
> On Mon, Jan 23, 2017 at 7:03 PM, Alexandr <sss123next@list.ru> wrote:
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA512
> > 
> > 
> > > Why would you do this?  There's no benefit of using zswap
> > > together
> > > with zram.
> > 
> > i just wanted to test zram and zswap, i still not dig to deep in
> > it,
> > but what i wanted is to use zram swap (with zswap disabled), and if
> > it
> > exceeded use real swap on block device with zswap enabled.
> 
> I don't believe that's possible, you can't enable zswap for only
> specific swap devices; and anyway, if you fill up zram, you won't
> really have any memory left for zswap to use will you?
> 
> However, it shouldn't encounter any BUG(), like you saw.  If it's
> reproducable for you, can you give details on how to reproduce it?
it happened only once, and i am noticed it only because few
applications hang, but i can run this setup for a while,  and let know
if it happen again, it happened on io and memory intensive app, this
machine have heavy load sometime, so i think it may happen again.
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCgAdFiEEl/sA7WQg6czXWI/dsEApXVthX7wFAliISMcACgkQsEApXVth
X7zgMA/+KoI1rdpCfJdxrihlkKavJcfYR/EoI4FGzQadJb6mZSihzuHcLVcIhiLV
VH+9HNADgygur//EQMAsliqT7HNxdEIpouMU/4w9dDxWiUFaAFo6kQYztVSQog8X
Kd3zJ1YagxSOXv0yx/OiR40/NwXygLSW8zRQ0rVwOIO6TF05lJYUA5QQ6F+izHGB
syNDQUwOukQ8Bcaxctic+uE/nn55ufkHMyjCtlQG2jG6/gk1590fzxugsk69U0Ou
qq8zFyShhYQ2onw36cJWi62rXpKvj7mj/suo7FwwmmLBS2R9jcrQILTnYnhAM+YH
JkVsIjXJrIWGLd3jeFpHwJMmvuMe5jPT3ppGGx3m4QbdRe+DAujT+5bWaQC5ubnN
4H84h6kGEsNTelf2rfZs58MomQy61adgSwKqMpOw81b/H11fYuZTVmlqBkyFKzos
0fkSTdkpHXSoKkLw6sgr2ch+jLJanR29+T9VRuR2m4+PRdLrUZF3L5HBejYDkE5O
3eF+eR/cVXoyZleVUAJaG7KAM+P8KEvz5kZAOOTGixFM23L1KnIRejYcjpYgKGkG
Q4k5+w56ONkzmL6IKqx5eOHstCxSl1R/uKNNN9rwrq1sRuRUpcQlxFfneVS7U9eo
pKYcoyO/yiYKdXTH82d/LJBf6yISZcwMsBSPciSWXQuLPtvkeNE=
=PnJw
-----END PGP SIGNATURE-----
Comment 11 Dan Streetman 2017-01-26 17:09:49 UTC
On Tue, Jan 24, 2017 at 11:02 PM, Chulmin Kim <cmlaika.kim@samsung.com> wrote:
> On 01/24/2017 03:16 PM, Dan Streetman wrote:
>>
>> On Mon, Jan 23, 2017 at 7:03 PM, Alexandr <sss123next@list.ru> wrote:
>>>
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA512
>>>
>>>
>>>> Why would you do this?  There's no benefit of using zswap together
>>>> with zram.
>>>
>>>
>>> i just wanted to test zram and zswap, i still not dig to deep in it,
>>> but what i wanted is to use zram swap (with zswap disabled), and if it
>>> exceeded use real swap on block device with zswap enabled.
>>
>>
>> I don't believe that's possible, you can't enable zswap for only
>> specific swap devices; and anyway, if you fill up zram, you won't
>> really have any memory left for zswap to use will you?
>>
>> However, it shouldn't encounter any BUG(), like you saw.  If it's
>> reproducable for you, can you give details on how to reproduce it?
>>
>
> Hello. Mr. Streetman.
>
>
> Regarding to this problem, I have a question on zswap.
>
> Is there any reason that
> zswap_frontswap_load() does not call flush_dcache_page()?
>
> The zswap load function can dirty the page mapped to user space (might be
> shareable/writable) which seems exactly the condition mentioned in the
> definition of flush_dcache_page().
>
> I'm thinking that
> flush_dcache_page() should be called in the end of zswap_frontswap_load().
> Could you review my opinion?

I don't think it needs to, as i detailed in my response to the other thread.

Also, this is a different issue, I think - even if there is a cache
problem with pages loaded from zswap, i don't see how it would cause a
decompression failure - the zpool storage is the only code that has a
copy of its compressed pages, no userspace or any other kernel code
should be accessing any of it.

>
> Thanks!
> Chulmin Kim
>
>
>
>
>
>
>
>
>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>
>>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
Comment 12 Dan Berindei 2020-02-13 09:24:09 UTC
I got a very similar BUG, but without zram.

My kernel arguments are

zswap.enabled=1 zswap.max_pool_percent=20 zswap.zpool=z3fold zswap.compressor=lz4 mitigations=off systemd.unified_cgroup_hierarchy=0


kernel BUG at mm/zswap.c:1167!
invalid opcode: 0000 [#1] SMP NOPTI
CPU: 1 PID: 10002 Comm: Sweeper thread Tainted: G           OE     5.4.17-200.fc31.x86_64 #1
Hardware name: LENOVO 20FXS0BB14/20FXS0BB14, BIOS R07ET72W (2.12 ) 10/28/2016
RIP: 0010:zswap_frontswap_load+0x1b0/0x1c0
Code: 47 10 e8 c3 78 b4 00 41 89 c4 83 ab e8 21 00 00 01 48 8b 45 28 48 8b 75 30 48 8b 38 e8 59 2d 04 00 45 85 e4 0f 84 11 ff ff ff <0f> 0b e8 09 6c e2 ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41
RSP: 0000:ffffad7c03c8fc40 EFLAGS: 00010282
RAX: 00000000000000f0 RBX: ffff8b7b93dbce00 RCX: 000000000000000c
RDX: 00000000000000fc RSI: ffff8b7bbcf9d688 RDI: ffff8b7bebbbb010
RBP: ffff8b78166c7a10 R08: ffff8b7c2f08f900 R09: ffff8b7bebbbbd68
R10: ffff8b7bebbbb088 R11: ffff8b7bebbbbd59 R12: 00000000ffffffea
R13: ffff8b7c2930bde0 R14: ffff8b7c2930bde8 R15: ffff8b7bebbbb088
FS:  00007efd5d977700(0000) GS:ffff8b7c31840000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007efd7840c188 CR3: 0000000343254006 CR4: 00000000003606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 __frontswap_load+0x9c/0xf0
 swap_readpage+0xf0/0x2d0
 swapin_readahead+0x3ff/0x4c0
 do_swap_page+0x361/0x820
 __handle_mm_fault+0x901/0x1590
 handle_mm_fault+0xc4/0x1f0
 do_user_addr_fault+0x1f9/0x450
 do_page_fault+0x31/0x110
 page_fault+0x3e/0x50
RIP: 0033:0x7efd8eb621cb
Code: 8b 87 f8 00 00 00 55 48 89 e5 48 83 e8 01 83 e0 10 5d c3 66 90 66 2e 0f 1f 84 00 00 00 00 00 55 31 c0 48 85 f6 48 89 e5 74 34 <80> 7e 08 00 75 2a 4c 8b 47 18 8b 8f 00 01 00 00 48 89 f2 4c 29 c2
RSP: 002b:00007efd5d976b20 EFLAGS: 00010202
RAX: 0000000000000000 RBX: 00007efd8f9b9b80 RCX: 0000000000000007
RDX: 00007efd78267000 RSI: 00007efd7840c180 RDI: 00007efd88038140
RBP: 00007efd5d976b20 R08: 00007efd8826fa10 R09: 00007efd8826fa10
R10: 00007efd5d976ad0 R11: 0000000000008722 R12: 00007efd7840bb10
R13: 00007efd88038140 R14: 0000000000000690 R15: 000000000000033e
Modules linked in: bnep ccm xt_CHECKSUM xt_MASQUERADE nf_nat_tftp nf_conntrack_tftp tun bridge stp llc nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_REJECT nf_reject_ipv6 ip6t_rpfilter ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc xfs snd_hda_codec_hdmi iwlmvm intel_rapl_msr snd_hda_codec_realtek intel_rapl_common mac80211 snd_hda_codec_generic btusb btrtl btbcm btintel uvcvideo bluetooth snd_hda_intel videobuf2_vmalloc snd_intel_dspcfg videobuf2_memops snd_hda_codec videobuf2_v4l2 libarc4 x86_pkg_temp_thermal intel_powerclamp snd_hda_core coretemp videobuf2_common iwlwifi snd_hwdep videodev mei_hdcp snd_seq iTCO_wdt mei_wdt ecdh_generic kvm_intel iTCO_vendor_support mc ecc snd_seq_device kvm
 snd_pcm cfg80211 irqbypass rtsx_pci_ms intel_cstate pcspkr memstick snd_timer intel_uncore i2c_i801 mei_me intel_rapl_perf joydev wmi_bmof thinkpad_acpi mei intel_pch_thermal ledtrig_audio snd soundcore vboxnetadp(OE) vboxnetflt(OE) binfmt_misc vboxdrv(OE) ip_tables btrfs libcrc32c xor zstd_decompress zstd_compress raid6_pq dm_crypt rfkill i915 nouveau mxm_wmi rtsx_pci_sdmmc crct10dif_pclmul ttm mmc_core crc32_pclmul i2c_algo_bit crc32c_intel drm_kms_helper drm e1000e ghash_clmulni_intel rtsx_pci serio_raw wmi video fuse lz4 lz4_compress
---[ end trace 86f7b18068c500ae ]---
RIP: 0010:zswap_frontswap_load+0x1b0/0x1c0
Code: 47 10 e8 c3 78 b4 00 41 89 c4 83 ab e8 21 00 00 01 48 8b 45 28 48 8b 75 30 48 8b 38 e8 59 2d 04 00 45 85 e4 0f 84 11 ff ff ff <0f> 0b e8 09 6c e2 ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41
RSP: 0000:ffffad7c03c8fc40 EFLAGS: 00010282
RAX: 00000000000000f0 RBX: ffff8b7b93dbce00 RCX: 000000000000000c
RDX: 00000000000000fc RSI: ffff8b7bbcf9d688 RDI: ffff8b7bebbbb010
RBP: ffff8b78166c7a10 R08: ffff8b7c2f08f900 R09: ffff8b7bebbbbd68
R10: ffff8b7bebbbb088 R11: ffff8b7bebbbbd59 R12: 00000000ffffffea
R13: ffff8b7c2930bde0 R14: ffff8b7c2930bde8 R15: ffff8b7bebbbb088
FS:  00007efd5d977700(0000) GS:ffff8b7c31840000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007efd7840c188 CR3: 0000000343254006 CR4: 00000000003606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Comment 13 Michael 2020-07-07 11:54:14 UTC
I also got a very similar bug with zswap (without zram).

zswap preferences:
ZSWAP_ENABLED=Y
ZSWAP_SAME_FILLED_PAGES_ENABLED=Y
ZSWAP_ACCEPT_THRESHOLD_PERCENT=90
ZSWAP_MAX_POOL_PERCENT=60
ZSWAP_COMPRESSOR=zstd
ZSWAP_ZPOOL=z3fold

[12876.327430] ------------[ cut here ]------------
[12876.327433] kernel BUG at mm/zswap.c:1184!
[12876.327480] invalid opcode: 0000 [#1] SMP NOPTI
[12876.327484] CPU: 2 PID: 404784 Comm: java Tainted: P           OE     5.7.7-200.fc32.x86_64 #1
[12876.327485] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./960GM-VGS3 FX, BIOS P1.40 07/23/2015
[12876.327492] RIP: 0010:zswap_frontswap_load+0x197/0x1a0
[12876.327495] Code: 2b 3d e5 c6 17 01 48 8b 45 30 b9 00 02 00 00 48 c1 ff 06 48 c1 e7 0c 48 03 3d dd c6 17 01 f3 48 ab 83 aa f0 21 00 00 01 eb 94 <0f> 0b 0f 1f 80 00 00 00 00 0f 1f 44 00 00 41 57 49 89 f7 41 56 49
[12876.327497] RSP: 0000:ffffb15b0adefc88 EFLAGS: 00010282
[12876.327498] RAX: 0000000000000000 RBX: ffff951da176ca80 RCX: 0000000000000000
[12876.327500] RDX: 00000000000000d0 RSI: 0000000000000000 RDI: ffff951e59e15010
[12876.327501] RBP: ffff951e59e1b9a0 R08: ffff95204c482900 R09: 0000000000000000
[12876.327502] R10: 00000000000003b0 R11: 0000000000001000 R12: 00000000ffffffea
[12876.327503] R13: ffff95204fbf8670 R14: ffff95204fbf8678 R15: ffff951e59e15648
[12876.327504] FS:  00007f0fe8fe0700(0000) GS:ffff952053c80000(0000) knlGS:0000000000000000
[12876.327505] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12876.327506] CR2: 00000005f46e60c8 CR3: 000000015ba0c000 CR4: 00000000000006e0
[12876.327507] Call Trace:
[12876.327515]  __frontswap_load+0xaa/0x100
[12876.327517]  swap_readpage+0xec/0x310
[12876.327519]  swap_cluster_readahead+0x2ac/0x2e0
[12876.327523]  ? pagecache_get_page+0x37/0x3e0
[12876.327526]  do_swap_page+0x312/0x760
[12876.327528]  ? wp_page_reuse+0x58/0x70
[12876.327529]  ? do_wp_page+0x160/0x4a0
[12876.327531]  __handle_mm_fault+0xb33/0x1700
[12876.327535]  ? __switch_to_asm+0x34/0x70
[12876.327536]  ? __switch_to_asm+0x40/0x70
[12876.327538]  handle_mm_fault+0xc0/0x1e0
[12876.327542]  do_user_addr_fault+0x1f9/0x490
[12876.327544]  page_fault+0x3e/0x50
[12876.327547] RIP: 0033:0x7f0fd46f6fb3
[12876.327549] Code: 85 d2 75 12 49 8b f7 49 ba 80 12 cb e9 0f 7f 00 00 41 ff d2 eb 10 4b 89 7c 13 f8 49 83 c2 f8 4d 89 97 48 03 00 00 41 8b 6d 58 <45> 8b 5c ec 08 41 ba b2 9b 00 f8 49 b8 00 00 00 00 00 00 00 00 4f
[12876.327550] RSP: 002b:00007f0fe8fdf170 EFLAGS: 00010246
[12876.327551] RAX: 00000005f46e45f0 RBX: 00000000be8dc8cf RCX: 0000000000000000
[12876.327552] RDX: 00000005c02b5bc0 RSI: 00000005f46e45f0 RDI: 00007f0f5acc8070
[12876.327553] RBP: 00000000be8dcc18 R08: 0000000000000003 R09: 00000000be8dc8be
[12876.327554] R10: 0000000000000000 R11: 00000005f46e4678 R12: 0000000000000000
[12876.327555] R13: 00000005f46e45f0 R14: 0000000000000003 R15: 00007f0fe4012800
[12876.327558] Modules linked in: vfat fat uas usb_storage snd_seq_dummy snd_hrtimer nvidia_drm(POE) nvidia_modeset(POE) nvidia_uvm(OE) nvidia(POE) drm_kms_helper ipmi_devintf ipmi_msghandler xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_nat_tftp nft_objref nf_conntrack_tftp tun bridge stp llc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set nf_tables rfkill nfnetlink ip6table_filter ip6_tables iptable_filter zstd sunrpc edac_mce_amd kvm_amd ccp snd_hda_codec_hdmi kvm snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg irqbypass snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device pcspkr wmi_bmof joydev snd_pcm k10temp sp5100_tco i2c_piix4 snd_timer snd soundcore drm ip_tables ata_generic pata_acpi
[12876.327590]  serio_raw pata_atiixp atl1c wmi btrfs blake2b_generic libcrc32c xor raid6_pq fuse
[12876.327597] ---[ end trace 691263f645ef52be ]---
[12876.327599] RIP: 0010:zswap_frontswap_load+0x197/0x1a0
[12876.327601] Code: 2b 3d e5 c6 17 01 48 8b 45 30 b9 00 02 00 00 48 c1 ff 06 48 c1 e7 0c 48 03 3d dd c6 17 01 f3 48 ab 83 aa f0 21 00 00 01 eb 94 <0f> 0b 0f 1f 80 00 00 00 00 0f 1f 44 00 00 41 57 49 89 f7 41 56 49
[12876.327602] RSP: 0000:ffffb15b0adefc88 EFLAGS: 00010282
[12876.327603] RAX: 0000000000000000 RBX: ffff951da176ca80 RCX: 0000000000000000
[12876.327604] RDX: 00000000000000d0 RSI: 0000000000000000 RDI: ffff951e59e15010
[12876.327605] RBP: ffff951e59e1b9a0 R08: ffff95204c482900 R09: 0000000000000000
[12876.327606] R10: 00000000000003b0 R11: 0000000000001000 R12: 00000000ffffffea
[12876.327607] R13: ffff95204fbf8670 R14: ffff95204fbf8678 R15: ffff951e59e15648
[12876.327608] FS:  00007f0fe8fe0700(0000) GS:ffff952053c80000(0000) knlGS:0000000000000000
[12876.327609] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12876.327610] CR2: 00000005f46e60c8 CR3: 000000015ba0c000 CR4: 00000000000006e0
Comment 14 eva 2020-10-30 04:15:06 UTC
I got the same error, I don't thin I am using zram, shm is part of Manjaro though.

I created two swap files on two different SSDs. I am doing a big computation which requires about 100GB mem, So I created 110GB swap additionally to my 32GB Memory.


Using 

30.10.20 03:52	kernel	------------[ cut here ]------------
30.10.20 03:52	kernel	kernel BUG at mm/zswap.c:1184!
30.10.20 03:52	kernel	invalid opcode: 0000 [#2] PREEMPT SMP NOPTI
30.10.20 03:52	kernel	CPU: 14 PID: 412725 Comm: QThread Tainted: G      D    OE     5.9.1-1-MANJARO #1
30.10.20 03:52	kernel	Hardware name: Micro-Star International Co., Ltd. MS-7B79/X470 GAMING PLUS (MS-7B79), BIOS A.H0 06/12/2020
30.10.20 03:52	kernel	RIP: 0010:zswap_frontswap_load+0x240/0x260
30.10.20 03:52	kernel	Code: 00 00 e8 63 d1 e2 ff 65 8b 05 ec 28 d8 52 85 c0 0f 85 61 ff ff ff e8 86 f7 d6 ff e9 57 ff ff ff e8 7c f7 d6 ff e9 35 ff ff ff <0f> 0b e8 70 f7 d6 ff e9 00 ff ff ff e8 3f 41 7a 00 66 66 2e 0f 1f
30.10.20 03:52	kernel	RSP: 0000:ffffa7260add7c78 EFLAGS: 00010282
30.10.20 03:52	kernel	RAX: 0000000080000000 RBX: 00000000ffffffea RCX: 0000000000000000
30.10.20 03:52	kernel	RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00000000ffffffff
30.10.20 03:52	kernel	RBP: ffff8f818760fc40 R08: ffff8f82b5006a80 R09: 0000000000000000
30.10.20 03:52	kernel	R10: 00000000000002de R11: 0000000000001000 R12: ffff8f8130661000
30.10.20 03:52	kernel	R13: ffff8f814b590508 R14: ffff8f814b590500 R15: ffff8f7e6dc43d08
30.10.20 03:52	kernel	FS:  00007fd82c1f8640(0000) GS:ffff8f82beb80000(0000) knlGS:0000000000000000
30.10.20 03:52	kernel	CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
30.10.20 03:52	kernel	CR2: 00007fce005deb28 CR3: 0000000703ba4000 CR4: 0000000000350ee0
30.10.20 03:52	kernel	Call Trace:
30.10.20 03:52	kernel	 __frontswap_load+0x79/0xc0
30.10.20 03:52	kernel	 swap_readpage+0xb1/0x260
30.10.20 03:52	kernel	 swapin_readahead+0x454/0x520
30.10.20 03:52	kernel	 do_swap_page+0x45f/0x7e0
30.10.20 03:52	kernel	 handle_mm_fault+0xe2d/0x1a40
30.10.20 03:52	kernel	 ? blk_queue_exit+0xe/0x50
30.10.20 03:52	kernel	 do_user_addr_fault+0x1e3/0x420
30.10.20 03:52	kernel	 exc_page_fault+0x82/0x1c0
30.10.20 03:52	kernel	 ? asm_exc_page_fault+0x8/0x30
30.10.20 03:52	kernel	 asm_exc_page_fault+0x1e/0x30
30.10.20 03:52	kernel	RIP: 0033:0x55e3113bcde3
30.10.20 03:52	kernel	Code: 2e 00 50 ba 99 00 00 00 48 8d 35 b8 d6 2e 00 48 8d 3d d0 d6 2e 00 ff 15 63 55 4e 00 66 2e 0f 1f 84 00 00 00 00 00 90 48 8b 17 <83> 6a 18 01 75 3f 55 53 48 89 fb 48 83 ec 08 48 8b 2f 48 8d 7d 10
30.10.20 03:52	kernel	RSP: 002b:00007fd82c1f6f18 EFLAGS: 00010213
30.10.20 03:52	kernel	RAX: 00007fd8869b2360 RBX: 00007fd08f774510 RCX: 0000000000000c00
30.10.20 03:52	kernel	RDX: 00007fce005deb10 RSI: 0000000000000000 RDI: 00007fd5a185d390
30.10.20 03:52	kernel	RBP: 00007fd5a185d360 R08: 0000000000000000 R09: 0000000000000080
30.10.20 03:52	kernel	R10: 0000000000001000 R11: 0000000000000000 R12: 00007fd5a185c690
30.10.20 03:52	kernel	R13: 00007fd0cb207bf0 R14: 00007fca657805d0 R15: 00007fcc7bef94b0
30.10.20 03:52	kernel	Modules linked in: snd_seq_dummy snd_hrtimer snd_seq fuse mousedev joydev input_leds hid_generic snd_usb_audio snd_usbmidi_lib snd_rawmidi usbhid snd_seq_device hid mc rfkill nct6775 hwmon_vid uas usb_storage wmi_bmof amdgpu nls_iso8859_1 nls_cp437 vfat fat edac_mce_amd snd_hda_codec_hdmi kvm_amd gpu_sched i2c_algo_bit ttm snd_hda_intel snd_intel_dspcfg drm_kms_helper kvm snd_hda_codec snd_hda_core cec irqbypass snd_hwdep crct10dif_pclmul snd_pcm rc_core crc32_pclmul ghash_clmulni_intel aesni_intel snd_timer ccp crypto_simd syscopyarea sp5100_tco snd sysfillrect cryptd glue_helper sysimgblt rapl fb_sys_fops pcspkr soundcore i2c_piix4 k10temp rng_core r8168(OE) wmi evdev mac_hid pinctrl_amd gpio_amdpt acpi_cpufreq uinput i2c_dev vboxnetflt(OE) vboxnetadp(OE) drm vboxdrv(OE) sg crypto_user agpgart ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 crc32c_intel xhci_pci xhci_hcd
30.10.20 03:52	kernel	---[ end trace b9e0629bde4bd2ef ]---
30.10.20 03:52	kernel	RIP: 0010:zswap_frontswap_load+0x240/0x260
30.10.20 03:52	kernel	Code: 00 00 e8 63 d1 e2 ff 65 8b 05 ec 28 d8 52 85 c0 0f 85 61 ff ff ff e8 86 f7 d6 ff e9 57 ff ff ff e8 7c f7 d6 ff e9 35 ff ff ff <0f> 0b e8 70 f7 d6 ff e9 00 ff ff ff e8 3f 41 7a 00 66 66 2e 0f 1f
30.10.20 03:52	kernel	RSP: 0000:ffffa7260a583c78 EFLAGS: 00010282
30.10.20 03:52	kernel	RAX: 0000000080000000 RBX: 00000000ffffffea RCX: 0000000000000000
30.10.20 03:52	kernel	RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00000000ffffffff
30.10.20 03:52	kernel	RBP: ffff8f7ae9817150 R08: ffff8f82b5006a80 R09: 0000000000000000
30.10.20 03:52	kernel	R10: 00000000000004e0 R11: 0000000000001000 R12: ffff8f7d95e04000
30.10.20 03:52	kernel	R13: ffff8f814b590508 R14: ffff8f814b590500 R15: ffff8f80ce710588
30.10.20 03:52	kernel	FS:  00007fd82c1f8640(0000) GS:ffff8f82beb80000(0000) knlGS:0000000000000000
30.10.20 03:52	kernel	CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
30.10.20 03:52	kernel	CR2: 00007fce005deb28 CR3: 0000000703ba4000 CR4: 0000000000350ee0
30.10.20 03:52	kernel	------------[ cut here ]------------
30.10.20 03:52	kernel	WARNING: CPU: 14 PID: 412725 at kernel/exit.c:721 do_exit+0x47/0xa90
30.10.20 03:52	kernel	Modules linked in: snd_seq_dummy snd_hrtimer snd_seq fuse mousedev joydev input_leds hid_generic snd_usb_audio snd_usbmidi_lib snd_rawmidi usbhid snd_seq_device hid mc rfkill nct6775 hwmon_vid uas usb_storage wmi_bmof amdgpu nls_iso8859_1 nls_cp437 vfat fat edac_mce_amd snd_hda_codec_hdmi kvm_amd gpu_sched i2c_algo_bit ttm snd_hda_intel snd_intel_dspcfg drm_kms_helper kvm snd_hda_codec snd_hda_core cec irqbypass snd_hwdep crct10dif_pclmul snd_pcm rc_core crc32_pclmul ghash_clmulni_intel aesni_intel snd_timer ccp crypto_simd syscopyarea sp5100_tco snd sysfillrect cryptd glue_helper sysimgblt rapl fb_sys_fops pcspkr soundcore i2c_piix4 k10temp rng_core r8168(OE) wmi evdev mac_hid pinctrl_amd gpio_amdpt acpi_cpufreq uinput i2c_dev vboxnetflt(OE) vboxnetadp(OE) drm vboxdrv(OE) sg crypto_user agpgart ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 crc32c_intel xhci_pci xhci_hcd
30.10.20 03:52	kernel	CPU: 14 PID: 412725 Comm: QThread Tainted: G      D    OE     5.9.1-1-MANJARO #1
30.10.20 03:52	kernel	Hardware name: Micro-Star International Co., Ltd. MS-7B79/X470 GAMING PLUS (MS-7B79), BIOS A.H0 06/12/2020
30.10.20 03:52	kernel	RIP: 0010:do_exit+0x47/0xa90
30.10.20 03:52	kernel	Code: ec 48 65 48 8b 04 25 28 00 00 00 48 89 44 24 40 31 c0 48 8b 83 00 08 00 00 48 85 c0 74 0e 48 8b 10 48 39 d0 0f 84 89 04 00 00 <0f> 0b 65 44 8b 25 0f 02 f8 52 41 81 e4 00 ff ff 00 44 89 64 24 0c
30.10.20 03:52	kernel	RSP: 0000:ffffa7260add7ed8 EFLAGS: 00010216
30.10.20 03:52	kernel	RAX: ffffa7260add7d60 RBX: ffff8f7b28580000 RCX: 0000000000000000
30.10.20 03:52	kernel	RDX: ffff8f82b1fc3f48 RSI: ffffffffae383aaa RDI: 000000000000000b
30.10.20 03:52	kernel	RBP: 000000000000000b R08: 0000000000000000 R09: ffffa7260add7940
30.10.20 03:52	kernel	R10: 0000000000000000 R11: ffffa7260add7946 R12: 000000000000000b
30.10.20 03:52	kernel	R13: 0000000000000000 R14: ffff8f7b28580000 R15: 0000000000000006
30.10.20 03:52	kernel	FS:  00007fd82c1f8640(0000) GS:ffff8f82beb80000(0000) knlGS:0000000000000000
30.10.20 03:52	kernel	CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
30.10.20 03:52	kernel	CR2: 00007fce005deb28 CR3: 0000000703ba4000 CR4: 0000000000350ee0
30.10.20 03:52	kernel	Call Trace:
30.10.20 03:52	kernel	 rewind_stack_do_exit+0x17/0x17
30.10.20 03:52	kernel	RIP: 0033:0x55e3113bcde3
30.10.20 03:52	kernel	Code: 2e 00 50 ba 99 00 00 00 48 8d 35 b8 d6 2e 00 48 8d 3d d0 d6 2e 00 ff 15 63 55 4e 00 66 2e 0f 1f 84 00 00 00 00 00 90 48 8b 17 <83> 6a 18 01 75 3f 55 53 48 89 fb 48 83 ec 08 48 8b 2f 48 8d 7d 10
30.10.20 03:52	kernel	RSP: 002b:00007fd82c1f6f18 EFLAGS: 00010213
30.10.20 03:52	kernel	RAX: 00007fd8869b2360 RBX: 00007fd08f774510 RCX: 0000000000000c00
30.10.20 03:52	kernel	RDX: 00007fce005deb10 RSI: 0000000000000000 RDI: 00007fd5a185d390
30.10.20 03:52	kernel	RBP: 00007fd5a185d360 R08: 0000000000000000 R09: 0000000000000080
30.10.20 03:52	kernel	R10: 0000000000001000 R11: 0000000000000000 R12: 00007fd5a185c690
30.10.20 03:52	kernel	R13: 00007fd0cb207bf0 R14: 00007fca657805d0 R15: 00007fcc7bef94b0
30.10.20 03:52	kernel	---[ end trace b9e0629bde4bd2f0 ]---
Comment 15 Roman Odaisky 2020-12-10 19:06:21 UTC
I’m also running into this monthly or so while using zswap but NOT zram:

Dec 10 20:49:19 xps kernel: [1872599.779552] ------------[ cut here ]------------
Dec 10 20:49:19 xps kernel: [1872599.779554] kernel BUG at mm/zswap.c:1184!
Dec 10 20:49:19 xps kernel: [1872599.779561] invalid opcode: 0000 [#1] SMP PTI
Dec 10 20:49:19 xps kernel: [1872599.779563] CPU: 2 PID: 919648 Comm: chromium Not tainted 5.8.0-28-generic #30-Ubuntu
Dec 10 20:49:19 xps kernel: [1872599.779564] Hardware name: Dell Inc. XPS 15 9570/02MJVY, BIOS 1.15.0 12/25/2019
Dec 10 20:49:19 xps kernel: [1872599.779569] RIP: 0010:zswap_frontswap_load+0x1bd/0x1d0
Dec 10 20:49:19 xps kernel: [1872599.779571] Code: 3d 00 1d 1d 01 49 8b 44 24 30 b9 00 02 00 00 48 c1 ff 06 48 c1 e7 0c 48 03 3d f7 1c 1d 01 f3 48 ab 83 aa f0 12 00 00 01 eb 84 <0f> 0b e8 6c dd 8f 00 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44
Dec 10 20:49:19 xps kernel: [1872599.779573] RSP: 0000:ffffbbea01d63bf8 EFLAGS: 00010282
Dec 10 20:49:19 xps kernel: [1872599.779574] RAX: 0000000000000000 RBX: ffff981ac4456000 RCX: 0000000000000003
Dec 10 20:49:19 xps kernel: [1872599.779575] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff981f0611e010
Dec 10 20:49:19 xps kernel: [1872599.779576] RBP: ffffbbea01d63c38 R08: ffff981f25722600 R09: 0000000000000003
Dec 10 20:49:19 xps kernel: [1872599.779577] R10: ffff981e00d64000 R11: ffffbbea01d63bc0 R12: ffff981ac36b4b28
Dec 10 20:49:19 xps kernel: [1872599.779578] R13: 00000000ffffffea R14: ffff981f17d9c098 R15: ffff981f17d9c090
Dec 10 20:49:19 xps kernel: [1872599.779579] FS:  00007f7bb1c963c0(0000) GS:ffff981f2c480000(0000) knlGS:0000000000000000
Dec 10 20:49:19 xps kernel: [1872599.779580] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 10 20:49:19 xps kernel: [1872599.779581] CR2: 0000558e738af000 CR3: 000000023e50a005 CR4: 00000000003606e0
Dec 10 20:49:19 xps kernel: [1872599.779582] Call Trace:
Dec 10 20:49:19 xps kernel: [1872599.779586]  __frontswap_load+0x80/0xd0
Dec 10 20:49:19 xps kernel: [1872599.779588]  swap_readpage+0xaf/0x260
Dec 10 20:49:19 xps kernel: [1872599.779589]  swap_cluster_readahead+0x1c4/0x310
Dec 10 20:49:19 xps kernel: [1872599.779591]  swapin_readahead+0x2a/0x30
Dec 10 20:49:19 xps kernel: [1872599.779593]  do_swap_page+0x466/0x800
Dec 10 20:49:19 xps kernel: [1872599.779595]  handle_pte_fault+0x201/0x260
Dec 10 20:49:19 xps kernel: [1872599.779597]  __handle_mm_fault+0x599/0x7c0
Dec 10 20:49:19 xps kernel: [1872599.779598]  ? blk_mq_force_complete_rq+0x70/0xf0
Dec 10 20:49:19 xps kernel: [1872599.779600]  handle_mm_fault+0xc6/0x1f0
Dec 10 20:49:19 xps kernel: [1872599.779603]  do_user_addr_fault+0x1e2/0x450
Dec 10 20:49:19 xps kernel: [1872599.779606]  exc_page_fault+0x86/0x1a0
Dec 10 20:49:19 xps kernel: [1872599.779608]  ? asm_exc_page_fault+0x8/0x30
Dec 10 20:49:19 xps kernel: [1872599.779609]  asm_exc_page_fault+0x1e/0x30
Dec 10 20:49:19 xps kernel: [1872599.779611] RIP: 0033:0x7f7bb8de51d3
Dec 10 20:49:19 xps kernel: [1872599.779612] Code: 17 e0 c5 f8 77 c3 48 3b 15 e2 1d 06 00 0f 83 25 01 00 00 48 39 f7 72 0f 74 12 4c 8d 0c 16 4c 39 cf 0f 82 c5 01 00 00 48 89 d1 <f3> a4 c3 80 fa 10 73 17 80 fa 08 73 27 80 fa 04 73 33 80 fa 01 77
Dec 10 20:49:19 xps kernel: [1872599.779613] RSP: 002b:00007ffcda8aae48 EFLAGS: 00010206
Dec 10 20:49:19 xps kernel: [1872599.779614] RAX: 0000558e73898a20 RBX: 00007ffcda8aaf50 RCX: 0000000000061a20
Dec 10 20:49:19 xps kernel: [1872599.779615] RDX: 0000000000078000 RSI: 0000558e67c61900 RDI: 0000558e738af000
Dec 10 20:49:19 xps kernel: [1872599.779616] RBP: 0000000000078000 R08: 0000558e73898a20 R09: 0000558e67cc3320
Dec 10 20:49:19 xps kernel: [1872599.779617] R10: 0000558e726f9c10 R11: 00007f7bb8e43c00 R12: 00007ffcda8aaf60
Dec 10 20:49:19 xps kernel: [1872599.779618] R13: 0000000000000000 R14: 0000000000000001 R15: 0000558e73898a20
Dec 10 20:49:19 xps kernel: [1872599.779620] Modules linked in: sctp ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs cpuid veth ccm rfcomm xt_nat xt_tcpudp xt_MASQUERADE nf_conntrack_netlink nft_counter nft_chain_nat xt_addrtype nft_compat nf_tables nfnetlink xt_conntrack nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc xfrm_user xfrm_algo snd_seq_dummy snd_hrtimer cmac algif_hash algif_skcipher af_alg dell_rbu snd_hda_codec_hdmi bnep uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev btusb mc btrtl btbcm btintel cdc_acm bluetooth ecdh_generic ecc aufs overlay ipmi_devintf ipmi_msghandler z3fold snd_sof_pci snd_sof_intel_byt snd_sof_intel_ipc snd_sof_intel_hda_common snd_soc_hdac_hda snd_sof_xtensa_dsp snd_sof_intel_hda snd_sof snd_hda_ext_core snd_soc_acpi_intel_match snd_hda_codec_realtek snd_soc_acpi snd_hda_codec_generic binfmt_misc snd_soc_core snd_compress ac97_bus snd_pcm_dmaengine snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_hwdep
Dec 10 20:49:19 xps kernel: [1872599.779666]  x86_pkg_temp_thermal snd_pcm intel_powerclamp mei_hdcp intel_rapl_msr coretemp nls_iso8859_1 snd_seq_midi kvm_intel ath10k_pci snd_seq_midi_event kvm ath10k_core snd_rawmidi dell_laptop ledtrig_audio dell_wmi ath rapl dell_smbios intel_cstate snd_seq efi_pstore dcdbas mac80211 snd_seq_device input_leds joydev snd_timer sparse_keymap serio_raw dell_wmi_descriptor intel_wmi_thunderbolt wmi_bmof cfg80211 snd mei_me processor_thermal_device soundcore intel_rapl_common ee1004 libarc4 mei intel_pch_thermal intel_soc_dts_iosf int3400_thermal acpi_thermal_rel int3403_thermal int340x_thermal_zone dell_smo8800 mac_hid acpi_pad sch_fq_codel dell_smm_hwmon parport_pc ppdev lp parport ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq libcrc32c hid_logitech_hidpp hid_logitech_dj hid_generic usbhid i915 rtsx_pci_sdmmc i2c_algo_bit crct10dif_pclmul crc32_pclmul drm_kms_helper ghash_clmulni_intel aesni_intel syscopyarea sysfillrect crypto_simd sysimgblt cryptd fb_sys_fops
Dec 10 20:49:19 xps kernel: [1872599.779689]  glue_helper cec rc_core psmouse mxm_wmi i2c_i801 nvme drm intel_lpss_pci i2c_smbus nvme_core ahci intel_lpss rtsx_pci xhci_pci idma64 libahci xhci_pci_renesas virt_dma i2c_hid hid wmi video pinctrl_cannonlake pinctrl_intel
Dec 10 20:49:19 xps kernel: [1872599.779697] ---[ end trace 0201eabeb65213f5 ]---
Dec 10 20:49:19 xps kernel: [1872600.560549] RIP: 0010:zswap_frontswap_load+0x1bd/0x1d0
Dec 10 20:49:19 xps kernel: [1872600.560571] Code: 3d 00 1d 1d 01 49 8b 44 24 30 b9 00 02 00 00 48 c1 ff 06 48 c1 e7 0c 48 03 3d f7 1c 1d 01 f3 48 ab 83 aa f0 12 00 00 01 eb 84 <0f> 0b e8 6c dd 8f 00 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44
Dec 10 20:49:19 xps kernel: [1872600.560572] RSP: 0000:ffffbbea01d63bf8 EFLAGS: 00010282
Dec 10 20:49:19 xps kernel: [1872600.560574] RAX: 0000000000000000 RBX: ffff981ac4456000 RCX: 0000000000000003
Dec 10 20:49:19 xps kernel: [1872600.560575] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff981f0611e010
Dec 10 20:49:19 xps kernel: [1872600.560576] RBP: ffffbbea01d63c38 R08: ffff981f25722600 R09: 0000000000000003
Dec 10 20:49:19 xps kernel: [1872600.560579] R10: ffff981e00d64000 R11: ffffbbea01d63bc0 R12: ffff981ac36b4b28
Dec 10 20:49:19 xps kernel: [1872600.560580] R13: 00000000ffffffea R14: ffff981f17d9c098 R15: ffff981f17d9c090
Dec 10 20:49:19 xps kernel: [1872600.560582] FS:  00007f7bb1c963c0(0000) GS:ffff981f2c480000(0000) knlGS:0000000000000000
Dec 10 20:49:19 xps kernel: [1872600.560583] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 10 20:49:19 xps kernel: [1872600.560585] CR2: 0000558e738af000 CR3: 000000023e50a005 CR4: 00000000003606e0

I use zswap by means of the following systemd-swap configuration:

zswap_enabled=1
zswap_compressor=lzo-rle
zswap_max_pool_percent=25
zswap_zpool=z3fold

(other systemd-swap features are turned off)

Is there something I can do to help debug the failure to uncompress the data?
Comment 16 Zdenek Sojka 2020-12-11 11:42:54 UTC
I am also observing this, reproduces in ~1 hour after system startup if I am doing a CPU/RAM intensive work (eg. compiling).
It is much easier to trigger with zsmalloc than with z3fold. I am not using zram.
With z3fold other issues trigger ("Corrupted page table at address ..."), but they usually need hours or days to trigger 

# CONFIG_ZRAM is not set

# uname -a
Linux ntbn61v 5.4.82 #1 SMP Fri Dec 11 09:00:39 CET 2020 x86_64 Intel(R) Core(TM)2 Quad CPU Q9000 @ 2.00GHz GenuineIntel GNU/Linux

I tried running with SLAB and LOCK debug enabled, but nothing triggered; only the above BUG_ON():

# grep DEBUG /usr/src/linux-5.4.82/.config | grep -v MAC80211 | grep =y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_X86_DEBUGCTLMSR=y
CONFIG_BLK_DEBUG_FS=y
CONFIG_BT_DEBUGFS=y
CONFIG_IWLWIFI_DEBUG=y
CONFIG_DEBUG_FS=y
CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_MISC=y
CONFIG_DEBUG_SLAB=y
CONFIG_HAVE_DEBUG_KMEMLEAK=y
CONFIG_ARCH_HAS_DEBUG_VIRTUAL=y
CONFIG_DEBUG_MEMORY_INIT=y
CONFIG_LOCK_DEBUGGING_SUPPORT=y
CONFIG_DEBUG_RT_MUTEXES=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y
CONFIG_DEBUG_RWSEMS=y
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_DEBUG_LOCKDEP=y
CONFIG_DEBUG_BUGVERBOSE=y
CONFIG_X86_DEBUG_FPU=y


I have only photo of the panic messages; no textual form.

My ZSWAP parameters are:
compressor:lz4
enabled:Y
max_pool_percent:75
same_filled_pages_enabled:Y
zpool:zsmalloc
Comment 17 Vitaly 2020-12-12 09:12:33 UTC
Could you please apply the following patches and retest:
https://marc.info/?l=linux-mm&m=160752554706352&w=2 (1)
https://marc.info/?l=linux-mm&m=160752556606357&w=2 (2)
https://marc.info/?l=linux-mm&m=160752557106358&w=2 (3)

Patch (2) is not that relevant here, but it's better if you apply it too just to be consistent.
Comment 18 martin 2020-12-14 17:26:53 UTC
Same problem with 5.9.12. Zram+zswap usage if that is relevant.

```
/sys/module/zswap/parameters/same_filled_pages_enabled:Y
/sys/module/zswap/parameters/enabled:Y
/sys/module/zswap/parameters/max_pool_percent:20
/sys/module/zswap/parameters/compressor:zstd
/sys/module/zswap/parameters/zpool:z3fold
/sys/module/zswap/parameters/accept_threshold_percent:90
```

```
Dec 14 08:44:15  kernel: [418189.386438] ------------[ cut here ]------------
Dec 14 08:44:15  kernel: [418189.386442] kernel BUG at mm/zswap.c:1184!
Dec 14 08:44:15  kernel: [418189.386477] invalid opcode: 0000 [#1] SMP NOPTI
Dec 14 08:44:15  kernel: [418189.386494] CPU: 2 PID: 61197 Comm: bsend data Not tainted 5.9.12 #1
Dec 14 08:44:15  kernel: [418189.386517] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090006  04/28/2016
Dec 14 08:44:15  kernel: [418189.386548] RIP: 0010:zswap_frontswap_load+0x1a5/0x1b0
Dec 14 08:44:15  kernel: [418189.386565] Code: 89 fe e8 7e d5 28 00 41 89 c4 83 ab 00 0a 00 00 01 48 8b 45 28 48 8b 75 30 48 8b 38 e8 e4 81 03 00 45 85 e4 0f 84 2b ff ff ff <0f> 0b e8 04 7f 6d 00 0f 1f 40 00 0f 1f 44 00 00 41 57 41 56 49 89
Dec 14 08:44:15  kernel: [418189.388239] RSP: 0000:ffffc90008df3bc0 EFLAGS: 00010282
Dec 14 08:44:15  kernel: [418189.388974] RAX: 0000000000000001 RBX: ffff8880a55f1d80 RCX: ffff88824d6a0cc8
Dec 14 08:44:15  kernel: [418189.388974] RDX: ffffea0000000000 RSI: 0000000000000000 RDI: ffff88824d6a0010
Dec 14 08:44:15  kernel: [418189.388974] RBP: ffff88811c048b60 R08: ffff88821792ba40 R09: ffff88824d6a0cc8
Dec 14 08:44:15  kernel: [418189.388974] R10: 000000000000032a R11: 0000000000000000 R12: 00000000ffffffea
Dec 14 08:44:15  kernel: [418189.388974] R13: ffff888233667bc0 R14: ffff888233667bc8 R15: ffff88824d6a0cc8
Dec 14 08:44:15  kernel: [418189.388974] FS:  00007fc723fef700(0000) GS:ffff88826e080000(0000) knlGS:0000000000000000
Dec 14 08:44:15  kernel: [418189.392966] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 14 08:44:15  kernel: [418189.392966] CR2: 00007fc7c230a000 CR3: 0000000239504000 CR4: 00000000003506e0
Dec 14 08:44:15  kernel: [418189.392966] Call Trace:
Dec 14 08:44:15  kernel: [418189.392966]  __frontswap_load+0x6f/0xd0
Dec 14 08:44:15  kernel: [418189.396991]  swap_readpage+0xa9/0x250
Dec 14 08:44:15  kernel: [418189.396991]  swap_cluster_readahead+0x1ce/0x2c0
Dec 14 08:44:15  kernel: [418189.396991]  swapin_readahead+0x5c/0x420
Dec 14 08:44:15  kernel: [418189.396991]  ? pagecache_get_page+0x28/0x260
Dec 14 08:44:15  kernel: [418189.396991]  do_swap_page+0x554/0x980
Dec 14 08:44:15  kernel: [418189.396991]  handle_mm_fault+0xc5f/0x1920
Dec 14 08:44:15  kernel: [418189.396991]  do_user_addr_fault+0x1b8/0x3f0
Dec 14 08:44:15  kernel: [418189.396991]  exc_page_fault+0x79/0x180
Dec 14 08:44:15  kernel: [418189.396991]  ? asm_exc_page_fault+0x8/0x30
Dec 14 08:44:15  kernel: [418189.396991]  asm_exc_page_fault+0x1e/0x30
Dec 14 08:44:15  kernel: [418189.396991] RIP: 0033:0x7fd805c8ff27
Dec 14 08:44:15  kernel: [418189.396991] Code: 47 20 c5 fe 7f 44 17 c0 c5 fe 7f 47 40 c5 fe 7f 44 17 a0 c5 fe 7f 47 60 c5 fe 7f 44 17 80 48 01 fa 48 83 e2 80 48 39 d1 74 ba <c5> fd 7f 01 c5 fd 7f 41 20 c5 fd 7f 41 40 c5 fd 7f 41 60 48 81 c1
Dec 14 08:44:15  kernel: [418189.396991] RSP: 002b:00007fc723fee818 EFLAGS: 00010206
Dec 14 08:44:15  kernel: [418189.396991] RAX: 00007fc7c2300158 RBX: 00000000000605d8 RCX: 00007fc7c230a000
Dec 14 08:44:15  kernel: [418189.396991] RDX: 00007fc7c230c100 RSI: 0000000000000000 RDI: 00007fc7c2300158
Dec 14 08:44:15  kernel: [418189.396991] RBP: 0000000000000000 R08: 0000000000000000 R09: 00000000000545d8
Dec 14 08:44:15  kernel: [418189.396991] R10: 0000000000000000 R11: 0000000000000000 R12: 00007fc7982040c8
Dec 14 08:44:15  kernel: [418189.396991] R13: 000000000000c000 R14: 00000000000545d8 R15: 0000000000802000
Dec 14 08:44:15  kernel: [418189.396991] Modules linked in: bcache crc64 fuse zram dm_cache_smq dm_cache dm_persistent_data dm_bio_prison dm_bufio loop dm_crypt xfs z3fold dm_mod st sr_mod cdrom nf_tables xt_multiport nfnetlink iptable_filter bridge stp l$
Dec 14 08:44:15  kernel: [418189.420051] ---[ end trace 5749dfa1755522d2 ]---
Dec 14 08:44:15  kernel: [418189.420755] RIP: 0010:zswap_frontswap_load+0x1a5/0x1b0
Dec 14 08:44:15  kernel: [418189.421452] Code: 89 fe e8 7e d5 28 00 41 89 c4 83 ab 00 0a 00 00 01 48 8b 45 28 48 8b 75 30 48 8b 38 e8 e4 81 03 00 45 85 e4 0f 84 2b ff ff ff <0f> 0b e8 04 7f 6d 00 0f 1f 40 00 0f 1f 44 00 00 41 57 41 56 49 89
Dec 14 08:44:15  kernel: [418189.424609] RSP: 0000:ffffc90008df3bc0 EFLAGS: 00010282
Dec 14 08:44:15  kernel: [418189.425306] RAX: 0000000000000001 RBX: ffff8880a55f1d80 RCX: ffff88824d6a0cc8
Dec 14 08:44:15  kernel: [418189.425979] RDX: ffffea0000000000 RSI: 0000000000000000 RDI: ffff88824d6a0010
Dec 14 08:44:15  kernel: [418189.426643] RBP: ffff88811c048b60 R08: ffff88821792ba40 R09: ffff88824d6a0cc8
Dec 14 08:44:15  kernel: [418189.427293] R10: 000000000000032a R11: 0000000000000000 R12: 00000000ffffffea
Dec 14 08:44:15  kernel: [418189.427938] R13: ffff888233667bc0 R14: ffff888233667bc8 R15: ffff88824d6a0cc8
Dec 14 08:44:15  kernel: [418189.428571] FS:  00007fc723fef700(0000) GS:ffff88826e080000(0000) knlGS:0000000000000000
Dec 14 08:44:15  kernel: [418189.429212] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 14 08:44:15  kernel: [418189.429836] CR2: 00007fc7c230a000 CR3: 0000000239504000 CR4: 00000000003506e0
```
Comment 19 Zdenek Sojka 2021-01-06 19:09:02 UTC
(In reply to Vitaly from comment #17)
> Could you please apply the following patches and retest:
> https://marc.info/?l=linux-mm&m=160752554706352&w=2 (1)
> https://marc.info/?l=linux-mm&m=160752556606357&w=2 (2)
> https://marc.info/?l=linux-mm&m=160752557106358&w=2 (3)
> 
> Patch (2) is not that relevant here, but it's better if you apply it too
> just to be consistent.

Thank you for the patches. I tried to download them and fix them so they could be applied (the patch program complains the patch is malformed), but they didn't apply to the 5.4 kernel. On the other hand, (1) and (3) seem to be already included in 5.10.4 ; I can tell I didn't observe any problems with zswap@z3fold when compiling thunderbird and firefox -j4, which was almost always ending in panic with the old kernel.
I didn't check zsmalloc yet with the new kernel though.
Comment 20 Zdenek Sojka 2021-03-03 14:28:17 UTC
(In reply to Zdenek Sojka from comment #19)
> (In reply to Vitaly from comment #17)
> > Could you please apply the following patches and retest:
> > https://marc.info/?l=linux-mm&m=160752554706352&w=2 (1)
> > https://marc.info/?l=linux-mm&m=160752556606357&w=2 (2)
> > https://marc.info/?l=linux-mm&m=160752557106358&w=2 (3)
> > 
> > Patch (2) is not that relevant here, but it's better if you apply it too
> > just to be consistent.
> 
> Thank you for the patches. I tried to download them and fix them so they
> could be applied (the patch program complains the patch is malformed), but
> they didn't apply to the 5.4 kernel. On the other hand, (1) and (3) seem to
> be already included in 5.10.4 ; I can tell I didn't observe any problems
> with zswap@z3fold when compiling thunderbird and firefox -j4, which was
> almost always ending in panic with the old kernel.
> I didn't check zsmalloc yet with the new kernel though.

Still no issues with zswap@z3fold on recent 5.10.y and 5.11.2 kernels. zswap@zsmalloc is failing on 5.11.2, but it's a separate issue.

Note You need to log in before you can comment on or make changes to this bug.