Bug 99101

Summary: kernel bug at mm/memory.c:3137
Product: Memory Management Reporter: Trevor Saunders (tbsaunde)
Component: NUMA/discontigmemAssignee: mm_numa-discontigmem
Status: NEW ---    
Severity: normal CC: lstoakes, robert
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.1.0-rc5+ Subsystem:
Regression: No Bisected commit-id:
Attachments: test.c
test.c
test.c

Description Trevor Saunders 2015-05-28 14:40:24 UTC
str: rather involved but
build github.com/mozilla/rr
use rr to record startup of firefox (tested with nightly, but other things should probably work too.  You just need to run till a window is shown.
use rr to replay the recording.
When rr drops you into gdb set a breakpoint on do_main and tell gdb to continue.

results:
the following is dumped to dmesg and the rr process is killed.
[  490.073615] ------------[ cut here ]------------
[  490.073636] kernel BUG at mm/memory.c:3137!
[  490.073648] invalid opcode: 0000 [#1] SMP
[  490.073665] Modules linked in: ctr ccm bnep cpufreq_conservative cpufreq_powersave cpufreq_userspace cpufreq_stats binfmt_misc uinput nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop parport_pc ppdev lp parport x86_pkg_temp_thermal intel_powerclamp intel_rapl iosf_mbi arc4 iwldvm coretemp kvm_intel kvm snd_hda_codec_hdmi mac80211 i915 snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_controller btusb btbcm snd_hda_codec btintel iwlwifi uvcvideo joydev bluetooth snd_hda_core snd_hwdep snd_pcm iTCO_wdt drm_kms_helper iTCO_vendor_support crct10dif_pclmul crc32_pclmul cfg80211 drm psmouse videobuf2_vmalloc thinkpad_acpi videobuf2_memops videobuf2_core ghash_clmulni_intel v4l2_common evdev snd_timer serio_raw nvram videodev aesni_intel aes_x86_64 lrw gf128mul glue_helper
[  490.073980]  snd media ablk_helper pcspkr cryptd tpm_tis battery wmi tpm shpchp ac soundcore rfkill i2c_i801 i2c_algo_bit i2c_core processor video lpc_ich mfd_core button mei_me mei ext4 crc16 mbcache jbd2 dm_mod sg sd_mod crc32c_intel ahci libahci e1000e libata ehci_pci xhci_pci ehci_hcd sdhci_pci xhci_hcd scsi_mod sdhci mmc_core ptp usbcore pps_core thermal usb_common thermal_sys [last unloaded: speakup]
[  490.074168] CPU: 2 PID: 10737 Comm: rr Tainted: G         C      4.1.0-rc5+ #1
[  490.074185] Hardware name: LENOVO 232039U/232039U, BIOS G2ET95WW (2.55 ) 07/09/2013
[  490.074202] task: ffff8804098d0290 ti: ffff8804098e4000 task.ti: ffff8804098e4000
[  490.074219] RIP: 0010:[<ffffffff81188d5f>]  [<ffffffff81188d5f>] handle_mm_fault+0x115f/0x1640
[  490.074244] RSP: 0018:ffff8804098e7bc8  EFLAGS: 00010246
[  490.074258] RAX: 0000000000000100 RBX: 0000000000000000 RCX: 0000000000000120
[  490.074274] RDX: ffff8803f31cc7c0 RSI: 00003ffffffff000 RDI: 00000003f31cc067
[  490.074290] RBP: 00002aaabccf8000 R08: 00000003d1ccc120 R09: 000000000001e6a0
[  490.074306] R10: ffffffff8172ce12 R11: 0000000000000120 R12: ffff8803efbaa748
[  490.074321] R13: 0000000000000000 R14: ffff88002e311f30 R15: ffff8800cabbb800
[  490.074338] FS:  00007f983585a740(0000) GS:ffff88041e280000(0000) knlGS:0000000000000000
[  490.074356] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  490.074369] CR2: 000000000357f1b0 CR3: 00000003f168a000 CR4: 00000000001407e0
[  490.074385] Stack:
[  490.074391]  0000000000000100 0000000000000040 ffff88041e5eeb00 ffffea0000000001
[  490.074415]  0000000000000100 ffff8800000007c0 00000003d1ccc120 0000000000000000
[  490.074438]  ffff88041e5ee210 ffff88040000001f ffff88041e5eeb08 0000000000000000
[  490.074462] Call Trace:
[  490.074472]  [<ffffffff81182c68>] ? follow_page_pte+0x2b8/0x320
[  490.074486]  [<ffffffff81183154>] ? __get_user_pages+0x174/0x600
[  490.074504]  [<ffffffff812e1cf0>] ? radix_tree_lookup_slot+0x10/0x30
[  490.074519]  [<ffffffff81183e6e>] ? __access_remote_vm+0xde/0x2e0
[  490.074536]  [<ffffffff8123106d>] ? mem_rw.isra.14+0xad/0x180
[  490.074552]  [<ffffffff810a1fda>] ? set_next_entity+0x6a/0x480
[  490.074568]  [<ffffffff811c9c93>] ? __vfs_write+0x23/0xf0
[  490.074582]  [<ffffffff811cc745>] ? __sb_start_write+0x45/0x100
[  490.074598]  [<ffffffff8125f6b1>] ? security_file_permission+0x21/0xa0
[  490.074614]  [<ffffffff811ca384>] ? vfs_write+0xa4/0x1b0
[  490.074628]  [<ffffffff811cb25b>] ? SyS_pwrite64+0x6b/0xa0
[  490.074643]  [<ffffffff81574372>] ? system_call_fastpath+0x16/0x75
[  490.074658] Code: 24 90 00 00 00 48 89 8c 24 98 00 00 00 48 8d 74 24 68 48 89 54 24 70 89 44 24 68 49 8b 84 24 90 00 00 00 ff 50 18 e9 bb fa ff ff <0f> 0b 8d 50 e2 83 fa 01 0f 86 4b 03 00 00 83 f8 1d c7 44 24 30
[  490.074859] RIP  [<ffffffff81188d5f>] handle_mm_fault+0x115f/0x1640
[  490.074876]  RSP <ffff8804098e7bc8>
[  490.074899] ---[ end trace c03716d799219ad5 ]---
Comment 1 Trevor Saunders 2015-06-04 15:06:06 UTC
Somewhat simpler str:
still build github.com/mozilla/rr
compile this program with gcc -pthread test.c

#include <pthread.h>
#include <sys/time.h>

static pthread_t reader;

static void* reader_thread(void* dontcare) {
  return NULL;
}

int main(int argc, char* argv[]) {
  struct timeval ts;

  /* (Kick on the syscallbuf if it's enabled.) */
  gettimeofday(&ts, NULL);


  pthread_create(&reader, NULL, reader_thread, NULL);
  pthread_join(reader, NULL);

  return 0;
}

then run
rr record ./a.out
rr replay
when you are dropped into gdb continnue the inferior and see the connection is closed rr is killed and the same kernel bug message appears in dmesg.
Comment 2 Robert O'Callahan 2015-06-19 12:25:54 UTC
I have reduced this to a simple test program that doesn't involve rr. The program is basically

  int fd = open("/proc/self/mem", O_RDWR);
  void* p = mmap(NULL, PAGE_SIZE*2, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
  mmap(p, PAGE_SIZE, PROT_NONE, MAP_FIXED | MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
  pwrite(fd, zeroes, sizeof(zeroes), (size_t)p); 

It seems to be as simple as a write to /proc/.../mem that crosses VMAs.

On 4.0.4-301.fc22.x86_64, this program fails with a SIGSEGV and we get this in the kernel log:

[14719.326561] ------------[ cut here ]------------
[14719.326570] kernel BUG at mm/memory.c:3042!
[14719.326573] invalid opcode: 0000 [#164] SMP 
[14719.326577] Modules linked in: ccm nfnetlink_queue nfnetlink_log nfnetlink snd_seq_dummy rfcomm xt_CHECKSUM iptable_mangle ipt_MASQUERAD
E nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack tun bridge ebtable_filter ebtables ip6table_filter ip6_tables ppdev parport_pc parport fuse vmw_vsock_vmci_transport vsock vmw_vmci bnep vfat fat nvidia(POE) arc4 iwldvm uvcvideo btusb videobuf2_vmalloc videobuf2_core videobuf2_memops v4l2_common videodev bluetooth cdc_mbim media cdc_ncm usbnet cdc_acm cdc_wdm mii mac80211 snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi iTCO_wdt iTCO_vendor_support snd_hda_intel mmc_block snd_hda_controller snd_hda_codec intel_rapl iosf_mbi drm iwlwifi snd_hwdep x86_pkg_temp_thermal coretemp snd_seq snd_seq_device
[14719.326643]  cfg80211 snd_pcm sdhci_pci sdhci mmc_core lpc_ich mfd_core snd_timer kvm_intel mei_me mei i2c_i801 shpchp kvm thinkpad_acpi joydev snd serio_raw soundcore tpm_tis rfkill tpm video wmi nfsd auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc dm_crypt 8021q garp stp llc mrp crct10dif_pclmul crc32_pclmul e1000e crc32c_intel firewire_ohci ghash_clmulni_intel firewire_core uas usb_storage crc_itu_t ptp pps_core
[14719.326688] CPU: 6 PID: 21417 Comm: test Tainted: P      D W  OE   4.0.4-301.fc22.x86_64 #1
[14719.326691] Hardware name: LENOVO 243827M/243827M, BIOS G5ET61WW (2.03 ) 09/13/2012
[14719.326695] task: ffff8807405831b0 ti: ffff88061c00c000 task.ti: ffff88061c00c000
[14719.326697] RIP: 0010:[<ffffffff811d2dd4>]  [<ffffffff811d2dd4>] handle_mm_fault+0x12a4/0x1840
[14719.326710] RSP: 0018:ffff88061c00fbf8  EFLAGS: 00010246
[14719.326712] RAX: 0000000000000100 RBX: 0000000000000000 RCX: 0000000000000120
[14719.326715] RDX: ffff8806a614c8d0 RSI: 00003ffffffff000 RDI: 00000006a614c067
[14719.326717] RBP: ffff88061c00fcd8 R08: 00000005ef2dc120 R09: ffff8807a0bbba70
[14719.326719] R10: ffff8800000008d0 R11: 0000000000000120 R12: 0000000000000000
[14719.326721] R13: ffff8807a0bbba70 R14: ffff8806a53e9c38 R15: 00007f9ee9d1a000
[14719.326725] FS:  00007f9ee9cec700(0000) GS:ffff88081dd80000(0000) knlGS:0000000000000000
[14719.326727] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[14719.326729] CR2: 0000000000601000 CR3: 00000007f4db5000 CR4: 00000000001407e0
[14719.326732] Stack:
[14719.326734]  ffff88061c00fc80 0000000000000000 ffff88081dd98e38 ffff88061c00fda8
[14719.326738]  ffff88061c00fc88 ffffffff811cbc64 ffff8807a0bbba70 ffff8800ce994400
[14719.326742]  01000005ef2dc120 00000000000008d0 ffffea0000000001 ffff8800ce994400
[14719.326747] Call Trace:
[14719.326755]  [<ffffffff811cbc64>] ? follow_page_pte+0xc4/0x400
[14719.326761]  [<ffffffff811cc14c>] ? follow_page_mask+0x1ac/0x330
[14719.326766]  [<ffffffff811cc4c3>] __get_user_pages+0x1f3/0x710
[14719.326772]  [<ffffffff811cce32>] get_user_pages+0x52/0x60
[14719.326778]  [<ffffffff811cd19c>] __access_remote_vm+0xec/0x300
[14719.326783]  [<ffffffff811d353f>] access_remote_vm+0x1f/0x30
[14719.326790]  [<ffffffff8128b11b>] mem_rw.isra.13+0xbb/0x190
[14719.326795]  [<ffffffff8128b20b>] mem_write+0x1b/0x20
[14719.326801]  [<ffffffff8121d1d4>] vfs_write+0xb4/0x210
[14719.326809]  [<ffffffff810226a5>] ? do_audit_syscall_entry+0x55/0x80
[14719.326814]  [<ffffffff8121e07a>] SyS_pwrite64+0x9a/0xc0
[14719.326823]  [<ffffffff81788c89>] system_call_fastpath+0x12/0x17
[14719.326825] Code: ff ff 48 89 75 b8 48 8d 75 90 48 89 4d c0 48 89 55 98 89 45 90 49 8b 86 90 00 00 00 ff 50 18 4c 8b 95 60 ff ff ff e9 9e fa ff ff <0f> 0b 83 e8 1e 83 f8 01 0f 86 7a 03 00 00 c7 85 60 ff ff ff 10 
[14719.326868] RIP  [<ffffffff811d2dd4>] handle_mm_fault+0x12a4/0x1840
[14719.326873]  RSP <ffff88061c00fbf8>
[14719.326877] ---[ end trace a3f27ce02b728e21 ]---
Comment 3 Robert O'Callahan 2015-06-19 12:26:52 UTC
Created attachment 180381 [details]
test.c
Comment 4 Robert O'Callahan 2015-06-19 12:27:26 UTC
Created attachment 180391 [details]
test.c

Sorry, attached the wrong file previously.
Comment 5 Robert O'Callahan 2015-06-19 12:37:19 UTC
Created attachment 180401 [details]
test.c

Actually I can simplify it further.

This seems to boil down to writing to /proc/.../mem where memory has been mapped PROT_NONE.
Comment 6 Robert O'Callahan 2015-06-19 12:38:19 UTC
This definitely worked in older kernels, though I don't have a precise regression range.
Comment 7 Lorenzo Stoakes 2016-09-26 08:39:41 UTC
I contributed a patch to address this which has been merged into 4.8rc8, assuming there are no problems discovered with the patch during the testing window, this should be resolved.

ref: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=38e088546522e1e86d2b8f401a1354ad3a9b3303
Comment 8 Robert O'Callahan 2016-09-26 09:56:47 UTC
Thanks!