Bug 180101 - BUG: unable to handle kernel paging request at x with 4.8.3 and 4.8.4
Summary: BUG: unable to handle kernel paging request at x with 4.8.3 and 4.8.4
Status: RESOLVED INVALID
Alias: None
Product: File System
Classification: Unclassified
Component: btrfs (show other bugs)
Hardware: x86-64 Linux
: P1 high
Assignee: Andrew Morton
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-10-24 01:27 UTC by Joe Yasi
Modified: 2016-11-10 14:48 UTC (History)
0 users

See Also:
Kernel Version: 4.8.4
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
attachment-26294-0.html (3.50 KB, text/html)
2016-10-28 22:25 UTC, Joe Yasi
Details
attachment-30075-0.html (9.37 KB, text/html)
2016-10-29 01:00 UTC, Joe Yasi
Details

Description Joe Yasi 2016-10-24 01:27:15 UTC
After updating to 4.8.3 and 4.8.4, I am having stability issues. I can also reproduce them with 4.7.10. This issue does not occur with 4.8.2. I can also not reproduce after reverting the security fix 89eeba1594ac641a30b91942961e80fae978f839 "mm: remove gup_flags FOLL_WRITE games from __get_user_pages()" with 4.8.4.

This seems to happen when I'm doing an operation that mmaps a bunch of small files like building the kernel or cross-compiling an image for my router. I've been able to reliably reproduce it by running "sudo lsof +D /home" a few times while building a kernel located in my /home partition. The /home partition is btrfs on bcache. The bcache cache device is an Intel SATA SSD, and the backing device is an MD RAID 1 array of two 3 TB hard drives.

I tried to fix it by using the can_follow_write_pte function from the 3.2.83 version of the patch (243f858d7045b710a31c377112578387ead4dde1) which checks PageAnon and !PageKsm instead of pte_dirty, and can't reproduce the issue. If I switch back to the mainline 4.8.4 version, I can reproduce it reliably.

Any idea why pte_dirty is having an issue for me? Something in the btrfs/bcache/md layer?

This bug is triggered:
Oct 23 14:07:52 hostname kernel: [ 4880.464356] BUG: unable to handle kernel paging request at 00007facb85592b6
Oct 23 14:07:52 hostname kernel: [ 4880.464413] IP: [<ffffffffa1082731>] queued_spin_lock_slowpath+0xe1/0x170
Oct 23 14:07:52 hostname kernel: [ 4880.464463] PGD 7cee19067 PUD 0 
Oct 23 14:07:52 hostname kernel: [ 4880.464497] Oops: 0002 [#1] PREEMPT SMP
Oct 23 14:07:52 hostname kernel: [ 4880.464526] Modules linked in: pci_stub vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) rfcomm bnep binfmt_misc vfat fat snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core snd_usb_audio videodev snd_usbmidi_lib media snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm_oss snd_mixer_oss snd_pcm input_leds intel_rapl x86_pkg_temp_thermal btusb intel_powerclamp crct10dif_pclmul btrtl btbcm efi_pstore crc32_pclmul btintel crc32c_intel bluetooth ghash_clmulni_intel aesni_intel aes_x86_64 snd_seq_oss lrw glue_helper ablk_helper cryptd intel_cstate snd_seq_midi snd_rawmidi intel_rapl_perf snd_seq_midi_event snd_seq efivars snd_seq_device snd_timer snd soundcore wl(PO) cfg80211 rfkill sg battery intel_lpss_acpi intel_lpss mfd_core acpi_pad tpm_tis acpi_als tpm_tis_core kfifo_buf tpm industrialio nfsd auth_rpcgss coretemp oid_registry nfs_acl lockd loop grace sunrpc efivarfs ipv6 crc_ccitt hid_generic usbhid uas usb_storage igb e1000e dca ptp mxm_wmi bcache psmouse i915 intel_gtt pps_core drm_kms_helper xhci_pci hwmon syscopyarea xhci_hcd sysfillrect sysimgblt i2c_algo_bit fb_sys_fops usbcore sr_mod drm cdrom i2c_core usb_common fan thermal pinctrl_sunrisepoint wmi video pinctrl_intel button
Oct 23 14:07:52 hostname kernel: [ 4880.465490] CPU: 3 PID: 1139 Comm: lsof Tainted: P           O    4.8.3-customskl #1
Oct 23 14:07:52 hostname kernel: [ 4880.465540] Hardware name: System manufacturer System Product Name/Z170-DELUXE, BIOS 2202 09/19/2016
Oct 23 14:07:52 hostname kernel: [ 4880.465599] task: ffff9e4a40062640 task.stack: ffff9e468ef80000
Oct 23 14:07:52 hostname kernel: [ 4880.465643] RIP: 0010:[<ffffffffa1082731>]  [<ffffffffa1082731>] queued_spin_lock_slowpath+0xe1/0x170
Oct 23 14:07:52 hostname kernel: [ 4880.465705] RSP: 0018:ffff9e468ef83d00  EFLAGS: 00010202
Oct 23 14:07:52 hostname kernel: [ 4880.465742] RAX: 0000000000001fff RBX: ffff9e494f7f2718 RCX: ffff9e4b2ecd6ec0
Oct 23 14:07:52 hostname kernel: [ 4880.465789] RDX: 00007facb85592b6 RSI: 0000000080000000 RDI: ffff9e494f7f2718
Oct 23 14:07:52 hostname kernel: [ 4880.465836] RBP: 0000000000000000 R08: 0000000000100000 R09: 0000000000000000
Oct 23 14:07:52 hostname kernel: [ 4880.465887] R10: 0000000020ab886e R11: ffff9e494f7f26f8 R12: 0000000000000000
Oct 23 14:07:52 hostname kernel: [ 4880.465934] R13: ffff9e494f7f26c0 R14: ffff9e468ef83d90 R15: 0000000000000000
Oct 23 14:07:52 hostname kernel: [ 4880.465981] FS:  00007f3344595800(0000) GS:ffff9e4b2ecc0000(0000) knlGS:0000000000000000
Oct 23 14:07:52 hostname kernel: [ 4880.466033] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 23 14:07:52 hostname kernel: [ 4880.468745] CR2: 00007facb85592b6 CR3: 00000007daf09000 CR4: 00000000003406e0
Oct 23 14:07:52 hostname kernel: [ 4880.471484] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Oct 23 14:07:52 hostname kernel: [ 4880.474221] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Oct 23 14:07:52 hostname kernel: [ 4880.476930] Stack:
Oct 23 14:07:52 hostname kernel: [ 4880.479644]  ffffffffa13680aa 0000000000000000 ffff9e494f7f26c0 ffffffffa1162f3e
Oct 23 14:07:52 hostname kernel: [ 4880.482419]  ffff9e468ef83d90 ffff9e494f7f26c0 ffff9e468ef83ea0 0000000000000000
Oct 23 14:07:52 hostname kernel: [ 4880.485205]  ffff9e468ef83ea0 ffffffffa11632de ffff9e468ef83d90 ffff9e468ef83d80
Oct 23 14:07:52 hostname kernel: [ 4880.487984] Call Trace:
Oct 23 14:07:52 hostname kernel: [ 4880.490774]  [<ffffffffa13680aa>] ? lockref_get_not_dead+0x3a/0x80
Oct 23 14:07:52 hostname kernel: [ 4880.493558]  [<ffffffffa1162f3e>] ? unlazy_walk+0xee/0x180
Oct 23 14:07:52 hostname kernel: [ 4880.496362]  [<ffffffffa11632de>] ? complete_walk+0x2e/0x70
Oct 23 14:07:52 hostname kernel: [ 4880.499166]  [<ffffffffa1165453>] ? path_lookupat+0x93/0x100
Oct 23 14:07:52 hostname kernel: [ 4880.501977]  [<ffffffffa1167859>] ? filename_lookup+0x99/0x150
Oct 23 14:07:52 hostname kernel: [ 4880.504767]  [<ffffffffa1160d0e>] ? pipe_read+0x27e/0x340
Oct 23 14:07:52 hostname kernel: [ 4880.507550]  [<ffffffffa11674fa>] ? getname_flags+0x6a/0x1d0
Oct 23 14:07:52 hostname kernel: [ 4880.510278]  [<ffffffffa115d4c4>] ? vfs_fstatat+0x44/0x90
Oct 23 14:07:52 hostname kernel: [ 4880.513049]  [<ffffffffa115d96d>] ? SYSC_newlstat+0x1d/0x40
Oct 23 14:07:52 hostname kernel: [ 4880.515844]  [<ffffffffa1158b12>] ? vfs_read+0x112/0x130
Oct 23 14:07:52 hostname kernel: [ 4880.518654]  [<ffffffffa1159d7d>] ? SyS_read+0x3d/0x90
Oct 23 14:07:52 hostname kernel: [ 4880.521437]  [<ffffffffa15ff41f>] ? entry_SYSCALL_64_fastpath+0x17/0x93
Oct 23 14:07:52 hostname kernel: [ 4880.524245] Code: c1 e0 10 45 31 c9 85 c0 74 44 48 89 c2 c1 e8 12 48 c1 ea 0c 83 e8 01 83 e2 30 48 98 48 81 c2 c0 6e 01 00 48 03 14 c5 a0 21 a7 a1 <48> 89 0a 8b 41 08 85 c0 75 09 f3 90 8b 41 08 85 c0 74 f7 4c 8b 
Oct 23 14:07:52 hostname kernel: [ 4880.527307] RIP  [<ffffffffa1082731>] queued_spin_lock_slowpath+0xe1/0x170
Oct 23 14:07:52 hostname kernel: [ 4880.530277]  RSP <ffff9e468ef83d00>
Oct 23 14:07:52 hostname kernel: [ 4880.533247] CR2: 00007facb85592b6
Oct 23 14:07:52 hostname kernel: [ 4880.536159] ---[ end trace 78d7ef040dfa41d9 ]---
Oct 23 14:07:52 hostname kernel: [ 4880.638571] note: lsof[1139] exited with preempt_count 1
Oct 23 14:08:52 hostname kernel: [ 4940.464837] INFO: rcu_preempt detected stalls on CPUs/tasks:
Oct 23 14:08:52 hostname kernel: [ 4940.468014] 	Tasks blocked on level-0 rcu_node (CPUs 0-7): P1139
Oct 23 14:08:52 hostname kernel: [ 4940.471228] 	(detected by 2, t=60002 jiffies, g=469992, c=469991, q=109361)
Oct 23 14:08:52 hostname kernel: [ 4940.474458] lsof            x 0000000000030001     0  1139      0 0x00000002
Oct 23 14:08:52 hostname kernel: [ 4940.477714]  ffff9e4a40062640 ffff9e4796162e00 ffff9e4a40062640 ffff9e468ef84000
Oct 23 14:08:52 hostname kernel: [ 4940.480966]  ffff9e468ef83f10 ffff9e468ef83b18 ffff9e4a40062a00 00007facb85592b6
Oct 23 14:08:52 hostname kernel: [ 4940.484228]  0000000000030001 ffffffffa15fb513 ffff9e4a40062640 ffffffffa104e5c8
Oct 23 14:08:52 hostname kernel: [ 4940.487471] Call Trace:
Oct 23 14:08:52 hostname kernel: [ 4940.490683]  [<ffffffffa15fb513>] ? schedule+0x33/0x90
Oct 23 14:08:52 hostname kernel: [ 4940.493919]  [<ffffffffa104e5c8>] ? do_exit+0x708/0xac0
Oct 23 14:08:52 hostname kernel: [ 4940.497169]  [<ffffffffa1600d47>] ? rewind_stack_do_exit+0x17/0x20
Oct 23 14:08:52 hostname kernel: [ 4940.500409] lsof            x 0000000000030001     0  1139      0 0x00000002
Oct 23 14:08:52 hostname kernel: [ 4940.503676]  ffff9e4a40062640 ffff9e4796162e00 ffff9e4a40062640 ffff9e468ef84000
Oct 23 14:08:52 hostname kernel: [ 4940.506964]  ffff9e468ef83f10 ffff9e468ef83b18 ffff9e4a40062a00 00007facb85592b6
Oct 23 14:08:52 hostname kernel: [ 4940.510230]  0000000000030001 ffffffffa15fb513 ffff9e4a40062640 ffffffffa104e5c8
Oct 23 14:08:52 hostname kernel: [ 4940.513532] Call Trace:
Oct 23 14:08:52 hostname kernel: [ 4940.516807]  [<ffffffffa15fb513>] ? schedule+0x33/0x90
Oct 23 14:08:52 hostname kernel: [ 4940.520129]  [<ffffffffa104e5c8>] ? do_exit+0x708/0xac0
Oct 23 14:08:52 hostname kernel: [ 4940.523457]  [<ffffffffa1600d47>] ? rewind_stack_do_exit+0x17/0x20
Oct 23 14:10:10 hostname kernel: [ 5018.824396] INFO: rcu_sched self-detected stall on CPU
Oct 23 14:10:10 hostname kernel: [ 5018.826669] 	2-...: (59999 ticks this GP) idle=609/140000000000001/0 softirq=954063/954063 fqs=15000 
Oct 23 14:10:10 hostname kernel: [ 5018.828924] 	 (t=60000 jiffies g=333 c=332 q=1)
Oct 23 14:10:10 hostname kernel: [ 5018.831185] Task dump for CPU 2:
Oct 23 14:10:10 hostname kernel: [ 5018.833406] lsof            R  running task        0  1213      1 0x0000000c
Oct 23 14:10:10 hostname kernel: [ 5018.835688]  ffffffffa1a29f80 ffffffffa10ed903 ffff9e4b2ec97240 ffffffffa1a29f80
Oct 23 14:10:10 hostname kernel: [ 5018.837940]  0000000000000000 ffff9e4a13a9cc80 ffffffffa109c05a ffffffffa10a45f9
Oct 23 14:10:10 hostname kernel: [ 5018.840205]  003b9aca00000000 ffff9e4b2ec83ee0 0000000000000086 ffffffffa10a5c5c
Oct 23 14:10:10 hostname kernel: [ 5018.842491] Call Trace:
Oct 23 14:10:10 hostname kernel: [ 5018.844706]  <IRQ>  [<ffffffffa10ed903>] ? rcu_dump_cpu_stacks+0x88/0xaa
Oct 23 14:10:10 hostname kernel: [ 5018.846997]  [<ffffffffa109c05a>] ? rcu_check_callbacks+0x69a/0x8d0
Oct 23 14:10:10 hostname kernel: [ 5018.849251]  [<ffffffffa10a45f9>] ? timekeeping_update+0xe9/0x140
Oct 23 14:10:10 hostname kernel: [ 5018.851535]  [<ffffffffa10a5c5c>] ? update_wall_time+0x45c/0x770
Oct 23 14:10:10 hostname kernel: [ 5018.853790]  [<ffffffffa10accb0>] ? tick_sched_handle.isra.14+0x30/0x30
Oct 23 14:10:10 hostname kernel: [ 5018.856086]  [<ffffffffa109ef73>] ? update_process_times+0x23/0x50
Oct 23 14:10:10 hostname kernel: [ 5018.858336]  [<ffffffffa10acce3>] ? tick_sched_timer+0x33/0x60
Oct 23 14:10:10 hostname kernel: [ 5018.860593]  [<ffffffffa109f9a9>] ? __hrtimer_run_queues+0xb9/0x150
Oct 23 14:10:10 hostname kernel: [ 5018.862810]  [<ffffffffa109fc35>] ? hrtimer_interrupt+0x95/0x190
Oct 23 14:10:10 hostname kernel: [ 5018.865048]  [<ffffffffa10327de>] ? smp_trace_apic_timer_interrupt+0x5e/0x90
Oct 23 14:10:10 hostname kernel: [ 5018.867302]  [<ffffffffa160000f>] ? apic_timer_interrupt+0x7f/0x90
Oct 23 14:10:10 hostname kernel: [ 5018.869557]  <EOI>  [<ffffffffa108273d>] ? queued_spin_lock_slowpath+0xed/0x170
Oct 23 14:10:10 hostname kernel: [ 5018.871826]  [<ffffffffa13680aa>] ? lockref_get_not_dead+0x3a/0x80
Oct 23 14:10:10 hostname kernel: [ 5018.874097]  [<ffffffffa1162f3e>] ? unlazy_walk+0xee/0x180
Oct 23 14:10:10 hostname kernel: [ 5018.876372]  [<ffffffffa11632de>] ? complete_walk+0x2e/0x70
Oct 23 14:10:10 hostname kernel: [ 5018.878684]  [<ffffffffa1165453>] ? path_lookupat+0x93/0x100
Oct 23 14:10:10 hostname kernel: [ 5018.880953]  [<ffffffffa1167859>] ? filename_lookup+0x99/0x150
Oct 23 14:10:10 hostname kernel: [ 5018.883225]  [<ffffffffa1160d0e>] ? pipe_read+0x27e/0x340
Oct 23 14:10:10 hostname kernel: [ 5018.885532]  [<ffffffffa11674fa>] ? getname_flags+0x6a/0x1d0
Oct 23 14:10:10 hostname kernel: [ 5018.887824]  [<ffffffffa115d4c4>] ? vfs_fstatat+0x44/0x90
Oct 23 14:10:10 hostname kernel: [ 5018.890126]  [<ffffffffa115d96d>] ? SYSC_newlstat+0x1d/0x40
Oct 23 14:10:10 hostname kernel: [ 5018.892443]  [<ffffffffa1158b12>] ? vfs_read+0x112/0x130
Oct 23 14:10:10 hostname kernel: [ 5018.894717]  [<ffffffffa1159d7d>] ? SyS_read+0x3d/0x90
Oct 23 14:10:10 hostname kernel: [ 5018.896912]  [<ffffffffa15ff41f>] ? entry_SYSCALL_64_fastpath+0x17/0x93
Comment 1 Joe Yasi 2016-10-24 02:22:02 UTC
I just reproduced it on 4.8.4 with the PageAnon and !PageKsm version of can_follow_write_pte function. The issue still happens, it's just less frequent.
Comment 2 Joe Yasi 2016-10-24 03:10:44 UTC
I also just reproduced it on 4.8.4 with 89eeba1594ac641a30b91942961e80fae978f839 completely reverted. It's just more more difficult.
Comment 3 Joe Yasi 2016-10-25 00:57:10 UTC
I've also seen this bug with a similar workflow: cross-compiling a router firmware image on a btrfs volume on bcache on top of an SATA SSD and an MD RAID 1 volume. Given that I've been able to reproduce the first BUG with or without the security fix, and it occurs during IO load, I think BTRFS might be a more appropriate bug component for this.

Oct 22 14:55:36 fishasshole kernel: [ 1722.616862] BUG: Bad page map in process as  pte:80000009a2711025 pmd:8a78d3067
Oct 22 14:55:36 fishasshole kernel: [ 1722.616864] page:ffffea002689c440 count:3 mapcount:-2048 mapping:ffff9d706dd418a8 index:0x25
Oct 22 14:55:36 fishasshole kernel: [ 1722.616866] flags: 0x800000000000086c(referenced|uptodate|lru|active|private)
Oct 22 14:55:36 fishasshole kernel: [ 1722.616866] page dumped because: bad pte
Oct 22 14:55:36 fishasshole kernel: [ 1722.616867] page->mem_cgroup:ffff9d7078012000
Oct 22 14:55:36 fishasshole kernel: [ 1722.616868] addr:00000000f76ee000 vm_flags:00000071 anon_vma:          (null) mapping:ffff9d706dd418a8 index:25
Oct 22 14:55:36 fishasshole kernel: [ 1722.616872] file:ld.so.cache fault:filemap_fault mmap:btrfs_file_mmap readpage:btrfs_readpage
Oct 22 14:55:36 fishasshole kernel: [ 1722.616873] CPU: 5 PID: 32512 Comm: as Tainted: P           O    4.8.4-customskl #1
Oct 22 14:55:36 fishasshole kernel: [ 1722.616874] Hardware name: System manufacturer System Product Name/Z170-DELUXE, BIOS 2202 09/19/2016
Oct 22 14:55:36 fishasshole kernel: [ 1722.616874]  0000000000000000 ffffffff8b35b4ad 00000000f76ee000 ffff9d6f86f6df20
Oct 22 14:55:36 fishasshole kernel: [ 1722.616875]  ffffffff8b1168f7 ffffea0031ba1040 ffffffff8b14b34c ffffea002689c440
Oct 22 14:55:36 fishasshole kernel: [ 1722.616876]  00000000f76ee000 ffffea002689c440 ffff9d6ce78d3770 ffff9d6ce6e3be00
Oct 22 14:55:36 fishasshole kernel: [ 1722.616877] Call Trace:
Oct 22 14:55:36 fishasshole kernel: [ 1722.616880]  [<ffffffff8b35b4ad>] ? dump_stack+0x46/0x59
Oct 22 14:55:36 fishasshole kernel: [ 1722.616881]  [<ffffffff8b1168f7>] ? print_bad_pte+0x197/0x270
Oct 22 14:55:36 fishasshole kernel: [ 1722.616883]  [<ffffffff8b14b34c>] ? lock_page_memcg+0xc/0x80
Oct 22 14:55:36 fishasshole kernel: [ 1722.616884]  [<ffffffff8b118de8>] ? unmap_page_range+0x638/0x810
Oct 22 14:55:36 fishasshole kernel: [ 1722.616885]  [<ffffffff8b0fc1fe>] ? __pagevec_lru_add_fn+0x11e/0x1c0
Oct 22 14:55:36 fishasshole kernel: [ 1722.616886]  [<ffffffff8b119317>] ? unmap_vmas+0x47/0x90
Oct 22 14:55:36 fishasshole kernel: [ 1722.616887]  [<ffffffff8b11ec76>] ? unmap_region+0x96/0xf0
Oct 22 14:55:36 fishasshole kernel: [ 1722.616888]  [<ffffffff8b12072e>] ? do_munmap+0x27e/0x460
Oct 22 14:55:36 fishasshole kernel: [ 1722.616889]  [<ffffffff8b1209b1>] ? SyS_munmap+0x41/0x60
Oct 22 14:55:36 fishasshole kernel: [ 1722.616890]  [<ffffffff8b0016d7>] ? do_int80_syscall_32+0x47/0x90
Oct 22 14:55:36 fishasshole kernel: [ 1722.616891]  [<ffffffff8b600e5a>] ? entry_INT80_compat+0x2a/0x40
Comment 4 Andrew Morton 2016-10-28 21:52:20 UTC
(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Mon, 24 Oct 2016 01:27:15 +0000 bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=180101
> 
>             Bug ID: 180101
>            Summary: BUG: unable to handle kernel paging request at x with
>                     "mm: remove gup_flags FOLL_WRITE games from
>                     __get_user_pages()"
>            Product: Memory Management
>            Version: 2.5
>     Kernel Version: 4.8.4
>           Hardware: x86-64
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: high
>           Priority: P1
>          Component: Other
>           Assignee: akpm@linux-foundation.org
>           Reporter: joe.yasi@gmail.com
>         Regression: No
> 
> After updating to 4.8.3 and 4.8.4, I am having stability issues. I can also
> reproduce them with 4.7.10. This issue does not occur with 4.8.2. I can also
> not reproduce after reverting the security fix
> 89eeba1594ac641a30b91942961e80fae978f839 "mm: remove gup_flags FOLL_WRITE
> games
> from __get_user_pages()" with 4.8.4.

That's 19be0eaffa3ac7d8eb ("mm: remove gup_flags FOLL_WRITE games from
__get_user_pages()") in the upstream tree.

I seem to recall a fix for that patch went flying past earlier this
week.  Perhaps Linus recalls?

19be0eaffa3ac7d8eb has gone into a billion -stable trees so we'll need
to be attentive...


> This seems to happen when I'm doing an operation that mmaps a bunch of small
> files like building the kernel or cross-compiling an image for my router.
> I've
> been able to reliably reproduce it by running "sudo lsof +D /home" a few
> times
> while building a kernel located in my /home partition. The /home partition is
> btrfs on bcache. The bcache cache device is an Intel SATA SSD, and the
> backing
> device is an MD RAID 1 array of two 3 TB hard drives.
> 
> I tried to fix it by using the can_follow_write_pte function from the 3.2.83
> version of the patch (243f858d7045b710a31c377112578387ead4dde1) which checks
> PageAnon and !PageKsm instead of pte_dirty, and can't reproduce the issue. If
> I
> switch back to the mainline 4.8.4 version, I can reproduce it reliably.
> 
> Any idea why pte_dirty is having an issue for me? Something in the
> btrfs/bcache/md layer?
> 
> This bug is triggered:
> Oct 23 14:07:52 hostname kernel: [ 4880.464356] BUG: unable to handle kernel
> paging request at 00007facb85592b6
> Oct 23 14:07:52 hostname kernel: [ 4880.464413] IP: [<ffffffffa1082731>]
> queued_spin_lock_slowpath+0xe1/0x170
> Oct 23 14:07:52 hostname kernel: [ 4880.464463] PGD 7cee19067 PUD 0 
> Oct 23 14:07:52 hostname kernel: [ 4880.464497] Oops: 0002 [#1] PREEMPT SMP
> Oct 23 14:07:52 hostname kernel: [ 4880.464526] Modules linked in: pci_stub
> vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) rfcomm bnep binfmt_misc
> vfat
> fat snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic uvcvideo
> videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core
> snd_usb_audio
> videodev snd_usbmidi_lib media snd_hda_intel snd_hda_codec snd_hwdep
> snd_hda_core snd_pcm_oss snd_mixer_oss snd_pcm input_leds intel_rapl
> x86_pkg_temp_thermal btusb intel_powerclamp crct10dif_pclmul btrtl btbcm
> efi_pstore crc32_pclmul btintel crc32c_intel bluetooth ghash_clmulni_intel
> aesni_intel aes_x86_64 snd_seq_oss lrw glue_helper ablk_helper cryptd
> intel_cstate snd_seq_midi snd_rawmidi intel_rapl_perf snd_seq_midi_event
> snd_seq efivars snd_seq_device snd_timer snd soundcore wl(PO) cfg80211 rfkill
> sg battery intel_lpss_acpi intel_lpss mfd_core acpi_pad tpm_tis acpi_als
> tpm_tis_core kfifo_buf tpm industrialio nfsd auth_rpcgss coretemp
> oid_registry
> nfs_acl lockd loop grace sunrpc efivarfs ipv6 crc_ccitt hid_generic usbhid
> uas
> usb_storage igb e1000e dca ptp mxm_wmi bcache psmouse i915 intel_gtt pps_core
> drm_kms_helper xhci_pci hwmon syscopyarea xhci_hcd sysfillrect sysimgblt
> i2c_algo_bit fb_sys_fops usbcore sr_mod drm cdrom i2c_core usb_common fan
> thermal pinctrl_sunrisepoint wmi video pinctrl_intel button
> Oct 23 14:07:52 hostname kernel: [ 4880.465490] CPU: 3 PID: 1139 Comm: lsof
> Tainted: P           O    4.8.3-customskl #1
> Oct 23 14:07:52 hostname kernel: [ 4880.465540] Hardware name: System
> manufacturer System Product Name/Z170-DELUXE, BIOS 2202 09/19/2016
> Oct 23 14:07:52 hostname kernel: [ 4880.465599] task: ffff9e4a40062640
> task.stack: ffff9e468ef80000
> Oct 23 14:07:52 hostname kernel: [ 4880.465643] RIP:
> 0010:[<ffffffffa1082731>] 
> [<ffffffffa1082731>] queued_spin_lock_slowpath+0xe1/0x170
> Oct 23 14:07:52 hostname kernel: [ 4880.465705] RSP: 0018:ffff9e468ef83d00 
> EFLAGS: 00010202
> Oct 23 14:07:52 hostname kernel: [ 4880.465742] RAX: 0000000000001fff RBX:
> ffff9e494f7f2718 RCX: ffff9e4b2ecd6ec0
> Oct 23 14:07:52 hostname kernel: [ 4880.465789] RDX: 00007facb85592b6 RSI:
> 0000000080000000 RDI: ffff9e494f7f2718
> Oct 23 14:07:52 hostname kernel: [ 4880.465836] RBP: 0000000000000000 R08:
> 0000000000100000 R09: 0000000000000000
> Oct 23 14:07:52 hostname kernel: [ 4880.465887] R10: 0000000020ab886e R11:
> ffff9e494f7f26f8 R12: 0000000000000000
> Oct 23 14:07:52 hostname kernel: [ 4880.465934] R13: ffff9e494f7f26c0 R14:
> ffff9e468ef83d90 R15: 0000000000000000
> Oct 23 14:07:52 hostname kernel: [ 4880.465981] FS:  00007f3344595800(0000)
> GS:ffff9e4b2ecc0000(0000) knlGS:0000000000000000
> Oct 23 14:07:52 hostname kernel: [ 4880.466033] CS:  0010 DS: 0000 ES: 0000
> CR0: 0000000080050033
> Oct 23 14:07:52 hostname kernel: [ 4880.468745] CR2: 00007facb85592b6 CR3:
> 00000007daf09000 CR4: 00000000003406e0
> Oct 23 14:07:52 hostname kernel: [ 4880.471484] DR0: 0000000000000000 DR1:
> 0000000000000000 DR2: 0000000000000000
> Oct 23 14:07:52 hostname kernel: [ 4880.474221] DR3: 0000000000000000 DR6:
> 00000000fffe0ff0 DR7: 0000000000000400
> Oct 23 14:07:52 hostname kernel: [ 4880.476930] Stack:
> Oct 23 14:07:52 hostname kernel: [ 4880.479644]  ffffffffa13680aa
> 0000000000000000 ffff9e494f7f26c0 ffffffffa1162f3e
> Oct 23 14:07:52 hostname kernel: [ 4880.482419]  ffff9e468ef83d90
> ffff9e494f7f26c0 ffff9e468ef83ea0 0000000000000000
> Oct 23 14:07:52 hostname kernel: [ 4880.485205]  ffff9e468ef83ea0
> ffffffffa11632de ffff9e468ef83d90 ffff9e468ef83d80
> Oct 23 14:07:52 hostname kernel: [ 4880.487984] Call Trace:
> Oct 23 14:07:52 hostname kernel: [ 4880.490774]  [<ffffffffa13680aa>] ?
> lockref_get_not_dead+0x3a/0x80
> Oct 23 14:07:52 hostname kernel: [ 4880.493558]  [<ffffffffa1162f3e>] ?
> unlazy_walk+0xee/0x180
> Oct 23 14:07:52 hostname kernel: [ 4880.496362]  [<ffffffffa11632de>] ?
> complete_walk+0x2e/0x70
> Oct 23 14:07:52 hostname kernel: [ 4880.499166]  [<ffffffffa1165453>] ?
> path_lookupat+0x93/0x100
> Oct 23 14:07:52 hostname kernel: [ 4880.501977]  [<ffffffffa1167859>] ?
> filename_lookup+0x99/0x150
> Oct 23 14:07:52 hostname kernel: [ 4880.504767]  [<ffffffffa1160d0e>] ?
> pipe_read+0x27e/0x340
> Oct 23 14:07:52 hostname kernel: [ 4880.507550]  [<ffffffffa11674fa>] ?
> getname_flags+0x6a/0x1d0
> Oct 23 14:07:52 hostname kernel: [ 4880.510278]  [<ffffffffa115d4c4>] ?
> vfs_fstatat+0x44/0x90
> Oct 23 14:07:52 hostname kernel: [ 4880.513049]  [<ffffffffa115d96d>] ?
> SYSC_newlstat+0x1d/0x40
> Oct 23 14:07:52 hostname kernel: [ 4880.515844]  [<ffffffffa1158b12>] ?
> vfs_read+0x112/0x130
> Oct 23 14:07:52 hostname kernel: [ 4880.518654]  [<ffffffffa1159d7d>] ?
> SyS_read+0x3d/0x90
> Oct 23 14:07:52 hostname kernel: [ 4880.521437]  [<ffffffffa15ff41f>] ?
> entry_SYSCALL_64_fastpath+0x17/0x93
> Oct 23 14:07:52 hostname kernel: [ 4880.524245] Code: c1 e0 10 45 31 c9 85 c0
> 74 44 48 89 c2 c1 e8 12 48 c1 ea 0c 83 e8 01 83 e2 30 48 98 48 81 c2 c0 6e 01
> 00 48 03 14 c5 a0 21 a7 a1 <48> 89 0a 8b 41 08 85 c0 75 09 f3 90 8b 41 08 85
> c0
> 74 f7 4c 8b 
> Oct 23 14:07:52 hostname kernel: [ 4880.527307] RIP  [<ffffffffa1082731>]
> queued_spin_lock_slowpath+0xe1/0x170
> Oct 23 14:07:52 hostname kernel: [ 4880.530277]  RSP <ffff9e468ef83d00>
> Oct 23 14:07:52 hostname kernel: [ 4880.533247] CR2: 00007facb85592b6
> Oct 23 14:07:52 hostname kernel: [ 4880.536159] ---[ end trace
> 78d7ef040dfa41d9
> ]---
> Oct 23 14:07:52 hostname kernel: [ 4880.638571] note: lsof[1139] exited with
> preempt_count 1
> Oct 23 14:08:52 hostname kernel: [ 4940.464837] INFO: rcu_preempt detected
> stalls on CPUs/tasks:
> Oct 23 14:08:52 hostname kernel: [ 4940.468014]     Tasks blocked on level-0
> rcu_node (CPUs 0-7): P1139
> Oct 23 14:08:52 hostname kernel: [ 4940.471228]     (detected by 2, t=60002
> jiffies, g=469992, c=469991, q=109361)
> Oct 23 14:08:52 hostname kernel: [ 4940.474458] lsof            x
> 0000000000030001     0  1139      0 0x00000002
> Oct 23 14:08:52 hostname kernel: [ 4940.477714]  ffff9e4a40062640
> ffff9e4796162e00 ffff9e4a40062640 ffff9e468ef84000
> Oct 23 14:08:52 hostname kernel: [ 4940.480966]  ffff9e468ef83f10
> ffff9e468ef83b18 ffff9e4a40062a00 00007facb85592b6
> Oct 23 14:08:52 hostname kernel: [ 4940.484228]  0000000000030001
> ffffffffa15fb513 ffff9e4a40062640 ffffffffa104e5c8
> Oct 23 14:08:52 hostname kernel: [ 4940.487471] Call Trace:
> Oct 23 14:08:52 hostname kernel: [ 4940.490683]  [<ffffffffa15fb513>] ?
> schedule+0x33/0x90
> Oct 23 14:08:52 hostname kernel: [ 4940.493919]  [<ffffffffa104e5c8>] ?
> do_exit+0x708/0xac0
> Oct 23 14:08:52 hostname kernel: [ 4940.497169]  [<ffffffffa1600d47>] ?
> rewind_stack_do_exit+0x17/0x20
> Oct 23 14:08:52 hostname kernel: [ 4940.500409] lsof            x
> 0000000000030001     0  1139      0 0x00000002
> Oct 23 14:08:52 hostname kernel: [ 4940.503676]  ffff9e4a40062640
> ffff9e4796162e00 ffff9e4a40062640 ffff9e468ef84000
> Oct 23 14:08:52 hostname kernel: [ 4940.506964]  ffff9e468ef83f10
> ffff9e468ef83b18 ffff9e4a40062a00 00007facb85592b6
> Oct 23 14:08:52 hostname kernel: [ 4940.510230]  0000000000030001
> ffffffffa15fb513 ffff9e4a40062640 ffffffffa104e5c8
> Oct 23 14:08:52 hostname kernel: [ 4940.513532] Call Trace:
> Oct 23 14:08:52 hostname kernel: [ 4940.516807]  [<ffffffffa15fb513>] ?
> schedule+0x33/0x90
> Oct 23 14:08:52 hostname kernel: [ 4940.520129]  [<ffffffffa104e5c8>] ?
> do_exit+0x708/0xac0
> Oct 23 14:08:52 hostname kernel: [ 4940.523457]  [<ffffffffa1600d47>] ?
> rewind_stack_do_exit+0x17/0x20
> Oct 23 14:10:10 hostname kernel: [ 5018.824396] INFO: rcu_sched self-detected
> stall on CPU
> Oct 23 14:10:10 hostname kernel: [ 5018.826669]     2-...: (59999 ticks this
> GP) idle=609/140000000000001/0 softirq=954063/954063 fqs=15000 
> Oct 23 14:10:10 hostname kernel: [ 5018.828924]      (t=60000 jiffies g=333
> c=332 q=1)
> Oct 23 14:10:10 hostname kernel: [ 5018.831185] Task dump for CPU 2:
> Oct 23 14:10:10 hostname kernel: [ 5018.833406] lsof            R  running
> task
>        0  1213      1 0x0000000c
> Oct 23 14:10:10 hostname kernel: [ 5018.835688]  ffffffffa1a29f80
> ffffffffa10ed903 ffff9e4b2ec97240 ffffffffa1a29f80
> Oct 23 14:10:10 hostname kernel: [ 5018.837940]  0000000000000000
> ffff9e4a13a9cc80 ffffffffa109c05a ffffffffa10a45f9
> Oct 23 14:10:10 hostname kernel: [ 5018.840205]  003b9aca00000000
> ffff9e4b2ec83ee0 0000000000000086 ffffffffa10a5c5c
> Oct 23 14:10:10 hostname kernel: [ 5018.842491] Call Trace:
> Oct 23 14:10:10 hostname kernel: [ 5018.844706]  <IRQ>  [<ffffffffa10ed903>]
> ?
> rcu_dump_cpu_stacks+0x88/0xaa
> Oct 23 14:10:10 hostname kernel: [ 5018.846997]  [<ffffffffa109c05a>] ?
> rcu_check_callbacks+0x69a/0x8d0
> Oct 23 14:10:10 hostname kernel: [ 5018.849251]  [<ffffffffa10a45f9>] ?
> timekeeping_update+0xe9/0x140
> Oct 23 14:10:10 hostname kernel: [ 5018.851535]  [<ffffffffa10a5c5c>] ?
> update_wall_time+0x45c/0x770
> Oct 23 14:10:10 hostname kernel: [ 5018.853790]  [<ffffffffa10accb0>] ?
> tick_sched_handle.isra.14+0x30/0x30
> Oct 23 14:10:10 hostname kernel: [ 5018.856086]  [<ffffffffa109ef73>] ?
> update_process_times+0x23/0x50
> Oct 23 14:10:10 hostname kernel: [ 5018.858336]  [<ffffffffa10acce3>] ?
> tick_sched_timer+0x33/0x60
> Oct 23 14:10:10 hostname kernel: [ 5018.860593]  [<ffffffffa109f9a9>] ?
> __hrtimer_run_queues+0xb9/0x150
> Oct 23 14:10:10 hostname kernel: [ 5018.862810]  [<ffffffffa109fc35>] ?
> hrtimer_interrupt+0x95/0x190
> Oct 23 14:10:10 hostname kernel: [ 5018.865048]  [<ffffffffa10327de>] ?
> smp_trace_apic_timer_interrupt+0x5e/0x90
> Oct 23 14:10:10 hostname kernel: [ 5018.867302]  [<ffffffffa160000f>] ?
> apic_timer_interrupt+0x7f/0x90
> Oct 23 14:10:10 hostname kernel: [ 5018.869557]  <EOI>  [<ffffffffa108273d>]
> ?
> queued_spin_lock_slowpath+0xed/0x170
> Oct 23 14:10:10 hostname kernel: [ 5018.871826]  [<ffffffffa13680aa>] ?
> lockref_get_not_dead+0x3a/0x80
> Oct 23 14:10:10 hostname kernel: [ 5018.874097]  [<ffffffffa1162f3e>] ?
> unlazy_walk+0xee/0x180
> Oct 23 14:10:10 hostname kernel: [ 5018.876372]  [<ffffffffa11632de>] ?
> complete_walk+0x2e/0x70
> Oct 23 14:10:10 hostname kernel: [ 5018.878684]  [<ffffffffa1165453>] ?
> path_lookupat+0x93/0x100
> Oct 23 14:10:10 hostname kernel: [ 5018.880953]  [<ffffffffa1167859>] ?
> filename_lookup+0x99/0x150
> Oct 23 14:10:10 hostname kernel: [ 5018.883225]  [<ffffffffa1160d0e>] ?
> pipe_read+0x27e/0x340
> Oct 23 14:10:10 hostname kernel: [ 5018.885532]  [<ffffffffa11674fa>] ?
> getname_flags+0x6a/0x1d0
> Oct 23 14:10:10 hostname kernel: [ 5018.887824]  [<ffffffffa115d4c4>] ?
> vfs_fstatat+0x44/0x90
> Oct 23 14:10:10 hostname kernel: [ 5018.890126]  [<ffffffffa115d96d>] ?
> SYSC_newlstat+0x1d/0x40
> Oct 23 14:10:10 hostname kernel: [ 5018.892443]  [<ffffffffa1158b12>] ?
> vfs_read+0x112/0x130
> Oct 23 14:10:10 hostname kernel: [ 5018.894717]  [<ffffffffa1159d7d>] ?
> SyS_read+0x3d/0x90
> Oct 23 14:10:10 hostname kernel: [ 5018.896912]  [<ffffffffa15ff41f>] ?
> entry_SYSCALL_64_fastpath+0x17/0x93
> 
> -- 
> You are receiving this mail because:
> You are the assignee for the bug.
Comment 5 Joe Yasi 2016-10-28 22:25:20 UTC
Created attachment 243071 [details]
attachment-26294-0.html

On Fri, Oct 28, 2016 at 5:52 PM, Andrew Morton <akpm@linux-foundation.org>
wrote:

>
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
>
> On Mon, 24 Oct 2016 01:27:15 +0000 bugzilla-daemon@bugzilla.kernel.org
> wrote:
>
> > https://bugzilla.kernel.org/show_bug.cgi?id=180101
> >
> >             Bug ID: 180101
> >            Summary: BUG: unable to handle kernel paging request at x with
> >                     "mm: remove gup_flags FOLL_WRITE games from
> >                     __get_user_pages()"
> >            Product: Memory Management
> >            Version: 2.5
> >     Kernel Version: 4.8.4
> >           Hardware: x86-64
> >                 OS: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: high
> >           Priority: P1
> >          Component: Other
> >           Assignee: akpm@linux-foundation.org
> >           Reporter: joe.yasi@gmail.com
> >         Regression: No
> >
> > After updating to 4.8.3 and 4.8.4, I am having stability issues. I can
> also
> > reproduce them with 4.7.10. This issue does not occur with 4.8.2. I can
> also
> > not reproduce after reverting the security fix
> > 89eeba1594ac641a30b91942961e80fae978f839 "mm: remove gup_flags
> FOLL_WRITE games
> > from __get_user_pages()" with 4.8.4.
>
> That's 19be0eaffa3ac7d8eb ("mm: remove gup_flags FOLL_WRITE games from
> __get_user_pages()") in the upstream tree.
>
> I seem to recall a fix for that patch went flying past earlier this
> week.  Perhaps Linus recalls?
>
> 19be0eaffa3ac7d8eb has gone into a billion -stable trees so we'll need
> to be attentive...
>
>
I've been able to reproduce the issue with 19be0eaffa3ac7d8eb ("mm: remove
gup_flags FOLL_WRITE games from __get_user_pages()") reverted. I initially
suspected it because I hadn't seen the issue until 4.8.3, and also saw it
when I tried 4.7.10. Initially, I wasn't able to reproduce it with 4.8.2,
but I've since been able to do that. This smells like a race condition
somewhere. It's possible I just happened to never encounter that race
before.

The /home partition in question is btrfs on bcache in writethrough mode.
The cache drive is an 180 GB Intel SATA SSD, and the backing device is two
WD 3 TB SATA HDDs configured in MD RAID 10 f2 layout. / is btrfs on an NVMe
SSD.

I've also seen btrfs checksum errors in the kernel log when reproducing
this. Rebooting and running btrfs scrub finds nothing though so it seems
like in memory corruption.

Thanks,
Joe
Comment 6 Linus Torvalds 2016-10-28 23:12:43 UTC
[ Chris, Kent, ignore the subject line, that was a mis-attribution of
the cause ]

On Fri, Oct 28, 2016 at 3:25 PM, Joseph Yasi <joe.yasi@gmail.com> wrote:
>
> I've been able to reproduce the issue with 19be0eaffa3ac7d8eb ("mm: remove
> gup_flags FOLL_WRITE games from __get_user_pages()") reverted.

Yeah, this doesn't look to have anything to do with that commit.

>   This smells like a race condition
> somewhere. It's possible I just happened to never encounter that race
> before.

It looks like some seriously odd corruption. It's doing spin_lock()
inside lockref_get_not_dead(), which is just a spinlock in the dentry.
There's no way it should cause problems.

The code disassembles to

   0: 45 31 c9             xor    %r9d,%r9d
   3: 85 c0                 test   %eax,%eax
   5: 74 44                 je     0x4b
   7: 48 89 c2             mov    %rax,%rdx
   a: c1 e8 12             shr    $0x12,%eax
   d: 48 c1 ea 0c           shr    $0xc,%rdx
  11: 83 e8 01             sub    $0x1,%eax
  14: 83 e2 30             and    $0x30,%edx
  17: 48 98                 cltq
  19: 48 81 c2 c0 6e 01 00 add    $0x16ec0,%rdx
  20: 48 03 14 c5 a0 21 a7 add    -0x5e58de60(,%rax,8),%rdx
  27: a1
  28:* 48 89 0a             mov    %rcx,(%rdx) <-- trapping instruction
  2b: 8b 41 08             mov    0x8(%rcx),%eax
  2e: 85 c0                 test   %eax,%eax
  30: 75 09                 jne    0x3b
  32: f3 90                 pause
  34: 8b 41 08             mov    0x8(%rcx),%eax
  37: 85 c0                 test   %eax,%eax
  39: 74 f7                 je     0x32

where the beginning of that sequence is the "decode_tail() code, and I
think the trapping instruction is the

                WRITE_ONCE(prev->next, node);

so it's from kernel/locking/qspinlock.c:536:

                prev = decode_tail(old);
                /*
                 * The above xchg_tail() is also a load of @lock which
generates,
                 * through decode_tail(), a pointer.
                 *
                 * The address dependency matches the RELEASE of xchg_tail()
                 * such that the access to @prev must happen after.
                 */
                smp_read_barrier_depends();

                WRITE_ONCE(prev->next, node);

                pv_wait_node(node, prev);

and yes, %rdx (which should contain that pointer to 'prev') has that
bogus pointer value 00007facb85592b6.

So that's a core spinlock in the dentry being corrupted.

Quite frankly, I'm somewhat suspicious of this:

  Modules linked in: pci_stub vboxpci(O) vboxnetadp(O) vboxnetflt(O)
     vboxdrv(O) rfcomm bnep binfmt_misc vfat fat snd_hda_codec_hdmi

ie those out-of-tree vbox modules..

But for others, here's a cleaned-up copy of the oops in case somebody
else sees something.

  BUG: unable to handle kernel paging request at 00007facb85592b6
  IP: queued_spin_lock_slowpath+0xe1/0x170
  PGD 7cee19067 PUD 0
  Oops: 0002 [#1] PREEMPT SMP
  Modules linked in: pci_stub vboxpci(O) vboxnetadp(O) vboxnetflt(O)
     vboxdrv(O) rfcomm bnep binfmt_misc vfat fat snd_hda_codec_hdmi
     snd_hda_codec_realtek snd_hda_codec_generic uvcvideo
     videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core
     snd_usb_audio videodev snd_usbmidi_lib media snd_hda_intel
     snd_hda_codec snd_hwdep snd_hda_core snd_pcm_oss snd_mixer_oss
     snd_pcm input_leds intel_rapl x86_pkg_temp_thermal btusb
     intel_powerclamp crct10dif_pclmul btrtl btbcm efi_pstore
     crc32_pclmul btintel crc32c_intel bluetooth ghash_clmulni_intel
     aesni_intel aes_x86_64 snd_seq_oss lrw glue_helper ablk_helper
     cryptd intel_cstate snd_seq_midi snd_rawmidi intel_rapl_perf
     snd_seq_midi_event snd_seq efivars snd_seq_device snd_timer snd
     soundcore wl(PO) cfg80211 rfkill sg battery intel_lpss_acpi
     intel_lpss mfd_core acpi_pad tpm_tis acpi_als tpm_tis_core
     kfifo_buf tpm industrialio nfsd auth_rpcgss coretemp oid_registry
     nfs_acl lockd loop grace sunrpc efivarfs ipv6 crc_ccitt hid_generic
     usbhid uas usb_storage igb e1000e dca ptp mxm_wmi bcache psmouse
     i915 intel_gtt pps_core drm_kms_helper xhci_pci hwmon syscopyarea
     xhci_hcd sysfillrect sysimgblt i2c_algo_bit fb_sys_fops usbcore
     sr_mod drm cdrom i2c_core usb_common fan thermal
     pinctrl_sunrisepoint wmi video pinctrl_intel button
  CPU: 3 PID: 1139 Comm: lsof Tainted: P           O    4.8.3-customskl #1
  Hardware name: System manufacturer System Product Name/Z170-DELUXE,
BIOS 2202 09/19/2016
  task: ffff9e4a40062640 task.stack: ffff9e468ef80000
  RIP: 0010:[<ffffffffa1082731>]  [<ffffffffa1082731>]
queued_spin_lock_slowpath+0xe1/0x170
  RSP: 0018:ffff9e468ef83d00  EFLAGS: 00010202
  RAX: 0000000000001fff RBX: ffff9e494f7f2718 RCX: ffff9e4b2ecd6ec0
  RDX: 00007facb85592b6 RSI: 0000000080000000 RDI: ffff9e494f7f2718
  RBP: 0000000000000000 R08: 0000000000100000 R09: 0000000000000000
  R10: 0000000020ab886e R11: ffff9e494f7f26f8 R12: 0000000000000000
  R13: ffff9e494f7f26c0 R14: ffff9e468ef83d90 R15: 0000000000000000
  FS:  00007f3344595800(0000) GS:ffff9e4b2ecc0000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007facb85592b6 CR3: 00000007daf09000 CR4: 00000000003406e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
    lockref_get_not_dead+0x3a/0x80
    unlazy_walk+0xee/0x180
    complete_walk+0x2e/0x70
    path_lookupat+0x93/0x100
    filename_lookup+0x99/0x150
    pipe_read+0x27e/0x340
    getname_flags+0x6a/0x1d0
    vfs_fstatat+0x44/0x90
    SYSC_newlstat+0x1d/0x40
    vfs_read+0x112/0x130
    SyS_read+0x3d/0x90
    entry_SYSCALL_64_fastpath+0x17/0x93
  Code: c1 e0 10 45 31 c9 85 c0 74 44 48 89 c2 c1 e8 12 48 c1 ea 0c 83
e8 01 83 e2 30 48 98 48 81 c2 c0 6e 01 00 48 03 14 c5 a0 21 a7 a1 <48>
89 0a 8b 41 08 85 c0 75 09 f3 90 8b 41 08 85 c0 74 f7 4c 8b
  RIP  [<ffffffffa1082731>] queued_spin_lock_slowpath+0xe1/0x170
   RSP <ffff9e468ef83d00>
  CR2: 00007facb85592b6

> The /home partition in question is btrfs on bcache in writethrough mode. The
> cache drive is an 180 GB Intel SATA SSD, and the backing device is two WD 3
> TB SATA HDDs configured in MD RAID 10 f2 layout. / is btrfs on an NVMe SSD.
>
> I've also seen btrfs checksum errors in the kernel log when reproducing
> this. Rebooting and running btrfs scrub finds nothing though so it seems
> like in memory corruption.

I'm adding Chris Mason and Kent Overstreet to the participants,
because we did have a recent btrfs memory corruption thing. This
corruption seems to be pretty widespread through, you migth also want
to just run "memtest" on your machine.

*Most* memory corruption tends to be due to software issues, but
sometimes it really ends up being the memory itself going bad.

But also, please test if this happens without the out-of-tree modules?

                Kubys
Comment 7 Joe Yasi 2016-10-29 01:00:07 UTC
Created attachment 243091 [details]
attachment-30075-0.html

On Oct 28, 2016 7:12 PM, "Linus Torvalds" <torvalds@linux-foundation.org>
wrote:
>
> [ Chris, Kent, ignore the subject line, that was a mis-attribution of
> the cause ]
>
> On Fri, Oct 28, 2016 at 3:25 PM, Joseph Yasi <joe.yasi@gmail.com> wrote:
> >
> > I've been able to reproduce the issue with 19be0eaffa3ac7d8eb ("mm:
remove
> > gup_flags FOLL_WRITE games from __get_user_pages()") reverted.
>
> Yeah, this doesn't look to have anything to do with that commit.
>
> >   This smells like a race condition
> > somewhere. It's possible I just happened to never encounter that race
> > before.
>
> It looks like some seriously odd corruption. It's doing spin_lock()
> inside lockref_get_not_dead(), which is just a spinlock in the dentry.
> There's no way it should cause problems.
>
> The code disassembles to
>
>    0: 45 31 c9             xor    %r9d,%r9d
>    3: 85 c0                 test   %eax,%eax
>    5: 74 44                 je     0x4b
>    7: 48 89 c2             mov    %rax,%rdx
>    a: c1 e8 12             shr    $0x12,%eax
>    d: 48 c1 ea 0c           shr    $0xc,%rdx
>   11: 83 e8 01             sub    $0x1,%eax
>   14: 83 e2 30             and    $0x30,%edx
>   17: 48 98                 cltq
>   19: 48 81 c2 c0 6e 01 00 add    $0x16ec0,%rdx
>   20: 48 03 14 c5 a0 21 a7 add    -0x5e58de60(,%rax,8),%rdx
>   27: a1
>   28:* 48 89 0a             mov    %rcx,(%rdx) <-- trapping instruction
>   2b: 8b 41 08             mov    0x8(%rcx),%eax
>   2e: 85 c0                 test   %eax,%eax
>   30: 75 09                 jne    0x3b
>   32: f3 90                 pause
>   34: 8b 41 08             mov    0x8(%rcx),%eax
>   37: 85 c0                 test   %eax,%eax
>   39: 74 f7                 je     0x32
>
> where the beginning of that sequence is the "decode_tail() code, and I
> think the trapping instruction is the
>
>                 WRITE_ONCE(prev->next, node);
>
> so it's from kernel/locking/qspinlock.c:536:
>
>                 prev = decode_tail(old);
>                 /*
>                  * The above xchg_tail() is also a load of @lock which
> generates,
>                  * through decode_tail(), a pointer.
>                  *
>                  * The address dependency matches the RELEASE of
xchg_tail()
>                  * such that the access to @prev must happen after.
>                  */
>                 smp_read_barrier_depends();
>
>                 WRITE_ONCE(prev->next, node);
>
>                 pv_wait_node(node, prev);
>
> and yes, %rdx (which should contain that pointer to 'prev') has that
> bogus pointer value 00007facb85592b6.
>
> So that's a core spinlock in the dentry being corrupted.
>
> Quite frankly, I'm somewhat suspicious of this:
>
>   Modules linked in: pci_stub vboxpci(O) vboxnetadp(O) vboxnetflt(O)
>      vboxdrv(O) rfcomm bnep binfmt_misc vfat fat snd_hda_codec_hdmi
>
> ie those out-of-tree vbox modules..
>
> But for others, here's a cleaned-up copy of the oops in case somebody
> else sees something.
>
>   BUG: unable to handle kernel paging request at 00007facb85592b6
>   IP: queued_spin_lock_slowpath+0xe1/0x170
>   PGD 7cee19067 PUD 0
>   Oops: 0002 [#1] PREEMPT SMP
>   Modules linked in: pci_stub vboxpci(O) vboxnetadp(O) vboxnetflt(O)
>      vboxdrv(O) rfcomm bnep binfmt_misc vfat fat snd_hda_codec_hdmi
>      snd_hda_codec_realtek snd_hda_codec_generic uvcvideo
>      videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core
>      snd_usb_audio videodev snd_usbmidi_lib media snd_hda_intel
>      snd_hda_codec snd_hwdep snd_hda_core snd_pcm_oss snd_mixer_oss
>      snd_pcm input_leds intel_rapl x86_pkg_temp_thermal btusb
>      intel_powerclamp crct10dif_pclmul btrtl btbcm efi_pstore
>      crc32_pclmul btintel crc32c_intel bluetooth ghash_clmulni_intel
>      aesni_intel aes_x86_64 snd_seq_oss lrw glue_helper ablk_helper
>      cryptd intel_cstate snd_seq_midi snd_rawmidi intel_rapl_perf
>      snd_seq_midi_event snd_seq efivars snd_seq_device snd_timer snd
>      soundcore wl(PO) cfg80211 rfkill sg battery intel_lpss_acpi
>      intel_lpss mfd_core acpi_pad tpm_tis acpi_als tpm_tis_core
>      kfifo_buf tpm industrialio nfsd auth_rpcgss coretemp oid_registry
>      nfs_acl lockd loop grace sunrpc efivarfs ipv6 crc_ccitt hid_generic
>      usbhid uas usb_storage igb e1000e dca ptp mxm_wmi bcache psmouse
>      i915 intel_gtt pps_core drm_kms_helper xhci_pci hwmon syscopyarea
>      xhci_hcd sysfillrect sysimgblt i2c_algo_bit fb_sys_fops usbcore
>      sr_mod drm cdrom i2c_core usb_common fan thermal
>      pinctrl_sunrisepoint wmi video pinctrl_intel button
>   CPU: 3 PID: 1139 Comm: lsof Tainted: P           O    4.8.3-customskl #1
>   Hardware name: System manufacturer System Product Name/Z170-DELUXE,
> BIOS 2202 09/19/2016
>   task: ffff9e4a40062640 task.stack: ffff9e468ef80000
>   RIP: 0010:[<ffffffffa1082731>]  [<ffffffffa1082731>]
> queued_spin_lock_slowpath+0xe1/0x170
>   RSP: 0018:ffff9e468ef83d00  EFLAGS: 00010202
>   RAX: 0000000000001fff RBX: ffff9e494f7f2718 RCX: ffff9e4b2ecd6ec0
>   RDX: 00007facb85592b6 RSI: 0000000080000000 RDI: ffff9e494f7f2718
>   RBP: 0000000000000000 R08: 0000000000100000 R09: 0000000000000000
>   R10: 0000000020ab886e R11: ffff9e494f7f26f8 R12: 0000000000000000
>   R13: ffff9e494f7f26c0 R14: ffff9e468ef83d90 R15: 0000000000000000
>   FS:  00007f3344595800(0000) GS:ffff9e4b2ecc0000(0000)
knlGS:0000000000000000
>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   CR2: 00007facb85592b6 CR3: 00000007daf09000 CR4: 00000000003406e0
>   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>   DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>   Call Trace:
>     lockref_get_not_dead+0x3a/0x80
>     unlazy_walk+0xee/0x180
>     complete_walk+0x2e/0x70
>     path_lookupat+0x93/0x100
>     filename_lookup+0x99/0x150
>     pipe_read+0x27e/0x340
>     getname_flags+0x6a/0x1d0
>     vfs_fstatat+0x44/0x90
>     SYSC_newlstat+0x1d/0x40
>     vfs_read+0x112/0x130
>     SyS_read+0x3d/0x90
>     entry_SYSCALL_64_fastpath+0x17/0x93
>   Code: c1 e0 10 45 31 c9 85 c0 74 44 48 89 c2 c1 e8 12 48 c1 ea 0c 83
> e8 01 83 e2 30 48 98 48 81 c2 c0 6e 01 00 48 03 14 c5 a0 21 a7 a1 <48>
> 89 0a 8b 41 08 85 c0 75 09 f3 90 8b 41 08 85 c0 74 f7 4c 8b
>   RIP  [<ffffffffa1082731>] queued_spin_lock_slowpath+0xe1/0x170
>    RSP <ffff9e468ef83d00>
>   CR2: 00007facb85592b6
>
> > The /home partition in question is btrfs on bcache in writethrough
mode. The
> > cache drive is an 180 GB Intel SATA SSD, and the backing device is two
WD 3
> > TB SATA HDDs configured in MD RAID 10 f2 layout. / is btrfs on an NVMe
SSD.
> >
> > I've also seen btrfs checksum errors in the kernel log when reproducing
> > this. Rebooting and running btrfs scrub finds nothing though so it seems
> > like in memory corruption.
>
> I'm adding Chris Mason and Kent Overstreet to the participants,
> because we did have a recent btrfs memory corruption thing. This
> corruption seems to be pretty widespread through, you migth also want
> to just run "memtest" on your machine.
>
> *Most* memory corruption tends to be due to software issues, but
> sometimes it really ends up being the memory itself going bad.
>
> But also, please test if this happens without the out-of-tree modules?

I was testing it without VirtualBox and broadcom-wl out-of-tree modules,
and the machine locked up and won't POST anymore. The motherboard is
claiming it's the CPU, so it looks like this was hardware. Sorry for the
noise.

-Joe

>
>                 Kubys
Comment 8 Linus Torvalds 2016-10-29 01:27:01 UTC
On Fri, Oct 28, 2016 at 6:00 PM, Joseph Yasi <joe.yasi@gmail.com> wrote:
>>
>> But also, please test if this happens without the out-of-tree modules?
>
> I was testing it without VirtualBox and broadcom-wl out-of-tree modules, and
> the machine locked up and won't POST anymore. The motherboard is claiming
> it's the CPU, so it looks like this was hardware. Sorry for the noise.

Hey, sorry about your hardware, but happy it isn't our software ;)

        Linus Torvalds
Comment 9 Joe Yasi 2016-11-10 14:48:12 UTC
I'm closing this. Replacing the motherboard fixed the issue. Some debris was shorting out the traces leading to the CPU socket.

Note You need to log in before you can comment on or make changes to this bug.