got panic when run lustre-racer over NFS test on latest upstream: ''' [ 63.927876] NFSD: starting 90-second grace period (net f00000a8) [ 63.981533] FS-Cache: Loaded [ 64.113298] FS-Cache: Netfs 'nfs' registered for caching [ 64.129363] Key type dns_resolver registered [ 64.291524] NFS: Registering the id_resolver key type [ 64.297179] Key type id_resolver registered [ 64.301859] Key type id_legacy registered [ 106.154061] 8[62671]: segfault at 8 ip 00007fab2a5de918 sp 00007ffc6598c230 error 4 in ld-2.17.so[7fab2a5d3000+22000] [ 254.397650] 3[15193]: segfault at 0 ip 0000000000403e4f sp 00007ffed8116660 error 6 in 3 (deleted)[400000+6000] [-- MARK -- Fri Apr 20 20:00:00 2018] [ 414.125571] 8[78617]: segfault at 0 ip 0000000000403e4f sp 00007ffdee59a8e0 error 6 in 8 (deleted)[400000+6000] [ 500.677718] perf: interrupt took too long (2519 > 2500), lowering kernel.perf_event_max_sample_rate to 79000 [ 506.303294] 7[53144]: segfault at 8 ip 00007f6e488a4918 sp 00007ffd0c3040d0 error 4 in ld-2.17.so[7f6e48899000+22000] [ 616.201854] 8[46687]: segfault at 4017a0 ip 00000000004017a0 sp 00007ffd095c2710 error 14 [ 637.442799] perf: interrupt took too long (3149 > 3148), lowering kernel.perf_event_max_sample_rate to 63000 [-- MARK -- Fri Apr 20 20:05:00 2018] [ 731.965938] 13[17830]: segfault at 0 ip 0000000000403e4f sp 00007ffdbaf15bd0 error 6 in 13 (deleted)[400000+6000] [ 803.352338] 0[46544]: segfault at 0 ip 0000000000403e4f sp 00007fffdd30aa50 error 6 in 0 (deleted)[400000+6000] [ 812.734251] 6[40484]: segfault at 0 ip 0000000000403e4f sp 00007ffedab551b0 error 6 in 6 (deleted)[400000+6000] [ 841.253201] perf: interrupt took too long (3969 > 3936), lowering kernel.perf_event_max_sample_rate to 50000 [-- MARK -- Fri Apr 20 20:10:00 2018] [ 1024.139506] 4[48815]: segfault at 0 ip 0000000000403e4f sp 00007fffdfc9aee0 error 6 in 4 (deleted)[400000+6000] [ 1086.414180] 18[38715]: segfault at 0 ip 0000000000403e4f sp 00007ffddf031c80 error 6 in 18 (deleted)[400000+6000] [ 1146.510480] perf: interrupt took too long (4972 > 4961), lowering kernel.perf_event_max_sample_rate to 40000 [ 1158.138011] NFS: server localhost error: fileid changed [ 1158.138011] fsid 0:45: expected fileid 0x0, got 0x84f [ 1158.149494] NFS: server localhost error: fileid changed [ 1158.149494] fsid 0:45: expected fileid 0x0, got 0x84f [ 1158.160963] NFS: server localhost error: fileid changed [ 1158.160963] fsid 0:45: expected fileid 0x0, got 0x84f [-- MARK -- Fri Apr 20 20:15:00 2018] [ 1392.954349] general protection fault: 0000 [#1] SMP PTI [ 1392.960185] Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache ext4 mbcache jbd2 loop nfsd auth_rpcgss nfs_acl lockd grace intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp sunrpc kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel crypto_simd cryptd glue_helper ipmi_ssif iTCO_wdt mei_me iTCO_vendor_support mxm_wmi pcspkr joydev mei sg i2c_i801 shpchp lpc_ich ipmi_si ipmi_devintf ipmi_msghandler wmi acpi_power_meter acpi_pad ip_tables xfs libcrc32c sr_mod cdrom sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ixgbe ahci libahci libata crc32c_intel i2c_core mdio dca dm_mirror dm_region_hash dm_log dm_mod [ 1393.031909] CPU: 52 PID: 5882 Comm: nfsd Not tainted 4.17.0-rc1 #1 [ 1393.038808] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS GRRFSDP1.86B.0271.R00.1510301446 10/30/2015 [ 1393.050167] RIP: 0010:prefetch_freepointer.isra.65+0xc/0x20 [ 1393.056389] RSP: 0018:ffffbc06089b7c30 EFLAGS: 00010286 [ 1393.062222] RAX: 0000000000000000 RBX: ff92b498cc49f800 RCX: 0000000000000183 [ 1393.070188] RDX: 0000000000000182 RSI: ff92b498cc49f800 RDI: ffff92b49f87de20 [ 1393.078154] RBP: 00000000014080c0 R08: 000000000002dfc0 R09: ffff92b479731070 [ 1393.086112] R10: 0000000001ce3140 R11: 0000000000000000 R12: ffff92b498cc490f [ 1393.094081] R13: ffff92b49f87de00 R14: ffffffffc0713fe3 R15: ffff92b49f87de00 [ 1393.102048] FS: 0000000000000000(0000) GS:ffff92b49f980000(0000) knlGS:0000000000000000 [ 1393.111081] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1393.117497] CR2: 000000000241768c CR3: 0000000830e0a002 CR4: 00000000003606e0 [ 1393.125465] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1393.133432] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1393.141399] Call Trace: [ 1393.144137] kmem_cache_alloc+0xa3/0x1c0 [ 1393.148541] ? nfsd4_free_file_rcu+0x20/0x20 [nfsd] [ 1393.153990] nfs4_alloc_stid+0x23/0xb0 [nfsd] [ 1393.158873] nfsd4_process_open2+0xc6b/0x12a0 [nfsd] [ 1393.164434] ? fh_verify+0x33e/0x5f0 [nfsd] [ 1393.169117] ? nfsd4_process_open1+0x137/0x400 [nfsd] [ 1393.174771] nfsd4_open+0x4ce/0x790 [nfsd] [ 1393.179360] ? nfsd4_encode_operation+0xdf/0x1c0 [nfsd] [ 1393.185205] nfsd4_proc_compound+0x5aa/0x7b0 [nfsd] [ 1393.190668] nfsd_dispatch+0xb7/0x250 [nfsd] [ 1393.195447] svc_process_common+0x399/0x750 [sunrpc] [ 1393.201011] svc_process+0xeb/0x100 [sunrpc] [ 1393.205793] nfsd+0xe3/0x150 [nfsd] [ 1393.209694] kthread+0xf8/0x130 [ 1393.213215] ? nfsd_destroy+0x60/0x60 [nfsd] [ 1393.217987] ? kthread_bind+0x10/0x10 [ 1393.222083] ret_from_fork+0x35/0x40 [ 1393.226074] Code: 48 89 fb e8 57 f1 ff ff 48 89 df 48 89 ee 31 c9 5b 5d 31 d2 e9 46 f1 ff ff 66 0f 1f 44 00 00 0f 1f 44 00 00 48 85 f6 74 09 8b 07 <48> 8b 04 06 0f 18 08 f3 c3 90 66 2e 0f 1f 84 00 00 00 00 00 0f [ 1393.247162] RIP: prefetch_freepointer.isra.65+0xc/0x20 RSP: ffffbc06089b7c30 [ 1393.255103] ---[ end trace 27db7bc3128451f1 ]--- [ 1393.260262] Kernel panic - not syncing: Fatal exception [ 1393.266153] WARNING: CPU: 82 PID: 504 at kernel/sched/core.c:1163 set_task_cpu+0x17b/0x190 [ 1393.275379] Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache ext4 mbcache jbd2 loop nfsd auth_rpcgss nfs_acl lockd grace intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp sunrpc kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel crypto_simd cryptd glue_helper ipmi_ssif iTCO_wdt mei_me iTCO_vendor_support mxm_wmi pcspkr joydev mei sg i2c_i801 shpchp lpc_ich ipmi_si ipmi_devintf ipmi_msghandler wmi acpi_power_meter acpi_pad ip_tables xfs libcrc32c sr_mod cdrom sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ixgbe ahci libahci libata crc32c_intel i2c_core mdio dca dm_mirror dm_region_hash dm_log dm_mod [ 1393.347085] CPU: 82 PID: 504 Comm: migration/82 Tainted: G D 4.17.0-rc1 #1 [ 1393.356211] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS GRRFSDP1.86B.0271.R00.1510301446 10/30/2015 [ 1393.367566] RIP: 0010:set_task_cpu+0x17b/0x190 [ 1393.372522] RSP: 0018:ffffbc060762be20 EFLAGS: 00010046 [ 1393.378351] RAX: 0000000000000000 RBX: ffff92b4dc962e00 RCX: 0000000000000000 [ 1393.386314] RDX: 0000000000000001 RSI: 000000000000000e RDI: ffff92b4dc962e00 [ 1393.394276] RBP: ffff92bc9f3a23c0 R08: 0000000000000bab R09: 0000000000020ef2 [ 1393.402238] R10: 000000000000007f R11: ffffbc06358dfc34 R12: 000000000000000e [ 1393.410200] R13: 000000000000000e R14: ffffbc06358dfe70 R15: ffff92bc9f39d4b0 [ 1393.418163] FS: 0000000000000000(0000) GS:ffff92bc9f380000(0000) knlGS:0000000000000000 [ 1393.427191] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1393.433602] CR2: 00007f20a0eb645f CR3: 0000000830e0a006 CR4: 00000000003606e0 [ 1393.441564] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1393.449526] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1393.457488] Call Trace: [ 1393.460221] move_queued_task+0x86/0x160 [ 1393.464596] migration_cpu_stop+0xf8/0x110 [ 1393.469168] ? sched_ttwu_pending+0xd0/0xd0 [ 1393.473838] cpu_stopper_thread+0x86/0x100 [ 1393.478414] smpboot_thread_fn+0x10e/0x160 [ 1393.482986] kthread+0xf8/0x130 [ 1393.486482] ? sort_range+0x20/0x20 [ 1393.490371] ? kthread_bind+0x10/0x10 [ 1393.494462] ret_from_fork+0x35/0x40 [ 1393.498450] Code: 0f ff ff ff 80 8b 5c 08 00 00 04 e9 2b ff ff ff 0f 0b e9 ca fe ff ff f7 43 60 fd ff ff ff 0f 84 d4 fe ff ff 0f 0b e9 cd fe ff ff <0f> 0b 0f 1f 00 e9 d9 fe ff ff 90 66 2e 0f 1f 84 00 00 00 00 00 [ 1393.519524] ---[ end trace 27db7bc3128451f2 ]--- [ 1393.524688] Kernel Offset: 0x2cc00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 1393.536734] ---[ end Kernel panic - not syncing: Fatal exception ]--- [ 1393.543930] ------------[ cut here ]------------ [ 1393.549081] sched: Unexpected reschedule of offline CPU#22! [ 1393.555305] WARNING: CPU: 52 PID: 5882 at arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x39/0x40 [ 1393.565887] Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache ext4 mbcache jbd2 loop nfsd auth_rpcgss nfs_acl lockd grace intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp sunrpc kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel crypto_simd cryptd glue_helper ipmi_ssif iTCO_wdt mei_me iTCO_vendor_support mxm_wmi pcspkr joydev mei sg i2c_i801 shpchp lpc_ich ipmi_si ipmi_devintf ipmi_msghandler wmi acpi_power_meter acpi_pad ip_tables xfs libcrc32c sr_mod cdrom sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ixgbe ahci libahci libata crc32c_intel i2c_core mdio dca dm_mirror dm_region_hash dm_log dm_mod [ 1393.637594] CPU: 52 PID: 5882 Comm: nfsd Tainted: G D W 4.17.0-rc1 #1 [ 1393.646041] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS GRRFSDP1.86B.0271.R00.1510301446 10/30/2015 [ 1393.657398] RIP: 0010:native_smp_send_reschedule+0x39/0x40 [ 1393.663517] RSP: 0018:ffff92b49f983d08 EFLAGS: 00010086 [ 1393.669345] RAX: 0000000000000000 RBX: ffff92ad43139700 RCX: 0000000000000006 [ 1393.677305] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff92b49f9968b0 [ 1393.685267] RBP: 00000000000223c0 R08: 0000000000000000 R09: 00000000000005c6 [ 1393.693227] R10: 0000000000000000 R11: ffff92b49f983a68 R12: ffff92ad4313a25c [ 1393.701189] R13: 0000000000000004 R14: 0000000000000046 R15: 0000000000000016 [ 1393.709153] FS: 0000000000000000(0000) GS:ffff92b49f980000(0000) knlGS:0000000000000000 [ 1393.718180] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1393.724591] CR2: 000000000241768c CR3: 0000000830e0a002 CR4: 00000000003606e0 [ 1393.732551] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1393.740514] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1393.748475] Call Trace: [ 1393.751204] <IRQ> [ 1393.753449] try_to_wake_up+0x3d7/0x450 [ 1393.757729] __wake_up_common+0x8f/0x160 [ 1393.762107] ep_poll_callback+0xc4/0x2f0 [ 1393.766486] __wake_up_common+0x8f/0x160 [ 1393.770851] __wake_up_common_lock+0x7a/0xc0 [ 1393.775619] irq_work_run_list+0x4c/0x70 [ 1393.779998] ? tick_sched_do_timer+0x80/0x80 [ 1393.784756] update_process_times+0x3b/0x50 [ 1393.789424] tick_sched_handle+0x25/0x60 [ 1393.793800] tick_sched_timer+0x37/0x70 [ 1393.798080] __hrtimer_run_queues+0xfb/0x270 [ 1393.802845] hrtimer_interrupt+0xfb/0x230 [ 1393.807320] smp_apic_timer_interrupt+0x62/0x130 [ 1393.812471] apic_timer_interrupt+0xf/0x20 [ 1393.817041] </IRQ> [ 1393.819383] RIP: 0010:panic+0x206/0x25c [ 1393.823658] RSP: 0018:ffffbc06089b7ae8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 [ 1393.832105] RAX: 0000000000000039 RBX: 0000000000000200 RCX: 0000000000000006 [ 1393.840065] RDX: 0000000000000000 RSI: 0000000000000092 RDI: ffff92b49f9968b0 [ 1393.848026] RBP: ffffbc06089b7b58 R08: 0000000000000000 R09: 00000000000005c4 [ 1393.855989] R10: 0000000000000000 R11: ffffbc06089b7850 R12: ffffffffaea6025c [ 1393.863950] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 1393.871913] ? panic+0x1ff/0x25c [ 1393.875515] oops_end+0xb0/0xc0 [ 1393.879019] general_protection+0x1e/0x30 [ 1393.883495] RIP: 0010:prefetch_freepointer.isra.65+0xc/0x20 [ 1393.889703] RSP: 0018:ffffbc06089b7c30 EFLAGS: 00010286 [ 1393.895532] RAX: 0000000000000000 RBX: ff92b498cc49f800 RCX: 0000000000000183 [ 1393.903494] RDX: 0000000000000182 RSI: ff92b498cc49f800 RDI: ffff92b49f87de20 [ 1393.911454] RBP: 00000000014080c0 R08: 000000000002dfc0 R09: ffff92b479731070 [ 1393.919417] R10: 0000000001ce3140 R11: 0000000000000000 R12: ffff92b498cc490f [ 1393.927379] R13: ffff92b49f87de00 R14: ffffffffc0713fe3 R15: ffff92b49f87de00 [ 1393.935354] ? nfs4_alloc_stid+0x23/0xb0 [nfsd] [ 1393.940402] kmem_cache_alloc+0xa3/0x1c0 [ 1393.944784] ? nfsd4_free_file_rcu+0x20/0x20 [nfsd] [ 1393.950230] nfs4_alloc_stid+0x23/0xb0 [nfsd] [ 1393.955095] nfsd4_process_open2+0xc6b/0x12a0 [nfsd] [ 1393.960635] ? fh_verify+0x33e/0x5f0 [nfsd] [ 1393.965304] ? nfsd4_process_open1+0x137/0x400 [nfsd] [ 1393.970944] nfsd4_open+0x4ce/0x790 [nfsd] [ 1393.975518] ? nfsd4_encode_operation+0xdf/0x1c0 [nfsd] [ 1393.981351] nfsd4_proc_compound+0x5aa/0x7b0 [nfsd] [ 1393.986795] nfsd_dispatch+0xb7/0x250 [nfsd] [ 1393.991568] svc_process_common+0x399/0x750 [sunrpc] [ 1393.997114] svc_process+0xeb/0x100 [sunrpc] [ 1394.001880] nfsd+0xe3/0x150 [nfsd] [ 1394.005771] kthread+0xf8/0x130 [ 1394.009277] ? nfsd_destroy+0x60/0x60 [nfsd] [ 1394.014038] ? kthread_bind+0x10/0x10 [ 1394.018122] ret_from_fork+0x35/0x40 [ 1394.022108] Code: 01 0f 92 c0 84 c0 74 15 48 8b 05 73 53 ed 00 be fd 00 00 00 48 8b 40 30 e9 25 d9 9a 00 89 fe 48 c7 c7 f8 82 a6 ae e8 c7 d1 03 00 <0f> 0b c3 0f 1f 40 00 0f 1f 44 00 00 53 be 20 00 08 01 48 89 fb [ 1394.043183] ---[ end trace 27db7bc3128451f3 ]--- [-- MARK -- Fri Apr 20 20:20:00 2018] [-- MARK -- Fri Apr 20 20:25:00 2018] '''
test script: ''' svn export https://github.com/hpc/lustre/trunk/lustre/tests/racer lustre-racer.orig BZ=lustre-racer loopf=/tmp/loop-$BZ nfsexpdir=/mnt/lustre nfsmp=/mnt/nfsmp nfsmp2=/mnt/nfsmp2 mntSrc1=${REMOTE_SHARE:-localhost:$nfsexpdir} mntSrc2=${REMOTE_SHARE2:-localhost:/} mkdir -p $nfsexpdir $nfsmp $nfsmp2 echo "$nfsexpdir *(rw,no_root_squash,fsid=0)" >/etc/exports dd if=/dev/zero of=$loopf bs=1024k count=1024 mkfs.ext4 -F $loopf service rpcbind start mount none /var/lib/nfs -t tmpfs touch /var/lib/nfs/etab # start nfs-mountd.service service_nfs start pushd lustre-racer.orig while :; do date mount $loopf $nfsexpdir -o loop || exit 1 mkdir $nfsexpdir/racer # start nfs-server.service service_nfs restart || exit 2 mount $mntSrc1 $nfsmp -t nfs -o nolock || exit 3 mount $mntSrc2 $nfsmp2 -t nfs4 || exit 4 DURATION=3600 sh racer.sh $nfsmp/racer & DURATION=3600 sh racer.sh $nfsmp2/racer & DURATION=3600 sh racer.sh $nfsexpdir/racer & wait %1 %2 %3 umount $nfsmp || exit 5 umount $nfsmp2 || exit 6 # stop nfs-server.service service_nfs stop || exit 7 umount $nfsexpdir || exit 8 e2fsck -n -f $loopf || exit 9 done popd '''
Created attachment 275471 [details] reproducer How reproducable: 20+%