Bug 219228 - KVM guest boot up with call trace with mounted image due to NFS server connection is not stable
Summary: KVM guest boot up with call trace with mounted image due to NFS server connec...
Status: NEW
Alias: None
Product: Networking
Classification: Unclassified
Component: IPV4 (show other bugs)
Hardware: All Linux
: P3 high
Assignee: Stephen Hemminger
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-09-04 02:32 UTC by xuelian.guo@intel.com
Modified: 2024-12-11 04:20 UTC (History)
2 users (show)

See Also:
Kernel Version:
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description xuelian.guo@intel.com 2024-09-04 02:32:26 UTC
Environment:
host kernel : 6.11.0-rc3 
guest kernel :6.11.0-rc5
qemu version : QEMU emulator version 9.0.92 (v9.1.0-rc2-28-g2eefd4fcec)


Bug detail description: 
try to boot KVM guest , there is call trace during kernel loading

Reproduce steps: 
1. set up nfs-server, and put guest image "centos9.img" into NFS-server's /images folder.

2. mount NFS-server folder to local:
   <NfS-server>:/images /share/guest_imgs nfs defaults,tcp,nolock 0 0

3. create qcow2 image based on remote NFS-server
    qemu-img created -b /share/guest_imgs/centos9.img -f qcow2 centos9.qcow2 -F raw
 
4. boot KVM guest with :
qemu-system-x86_64 -accel kvm \
-m 4096 \
-smp 4 \
-cpu host \
-drive file=centos9.qcow2,if=none,id=virtio-disk0 \
-device virtio-blk-pci,drive=virtio-disk0,bootindex=0 \
-device virtio-net-pci,netdev=nic0 \
-netdev user,id=nic0,hostfwd=tcp::10022-:22 \
-nographic

Error log: 
During KVM guest booting, there are call trace and Guest could boot up in the end, but boot time is long.
 INFO: task kworker/u8:1:29 blocked for more than 122 seconds.
[  246.987403]       Not tainted 6.11.0-rc5 #2
[  246.989674] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  246.992986] task:kworker/u8:1    state:D stack:0     pid:29    tgid:29    ppid:2      flags:0x00004000
[  246.996842] Workqueue: xfs-cil/dm-0 xlog_cil_push_work [xfs]
[  247.000029] Call Trace:
[  247.001707]  <TASK>
[  247.003263]  __schedule+0x276/0x710
[  247.005381]  schedule+0x27/0xa0
[  247.007300]  xlog_state_get_iclog_space+0x102/0x2a0 [xfs]
[  247.010298]  ? __pfx_default_wake_function+0x10/0x10
[  247.012800]  xlog_write+0x7b/0x260 [xfs]
[  247.015379]  ? _raw_spin_unlock+0xe/0x30
[  *** ] (5 of 5) A start job is running for…ernel arming (3min 39s / no limit)
[  247.023496]  xlog_cil_push_work+0x50e/0x6f0 [xfs]
[  247.026329]  process_one_work+0x158/0x360
[  247.028584]  worker_thread+0x235/0x340
[  247.030710]  ? __pfx_worker_thread+0x10/0x10
[  247.032997]  kthread+0xd0/0x100
[  247.034937]  ? __pfx_kthread+0x10/0x10
[  247.037063]  ret_from_fork+0x31/0x50
[  247.039161]  ? __pfx_kthread+0x10/0x10
[  247.041302]  ret_from_fork_asm+0x1a/0x30
[  247.043479]  </TASK>
[  247.045019] INFO: task kworker/0:3:255 blocked for more than 122 seconds.
[  247.047775]       Not tainted 6.11.0-rc5 #2
[  247.049802] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  247.052903] task:kworker/0:3     state:D stack:0     pid:255   tgid:255   ppid:2      flags:0x00004000
[  247.056403] Workqueue: xfs-sync/dm-0 xfs_log_worker [xfs]
[  247.058880] Call Trace:
[  247.060293]  <TASK>
[  247.061638]  __schedule+0x276/0x710
[  247.063332]  schedule+0x27/0xa0
[  247.064964]  schedule_timeout+0x14e/0x160
[  247.066778]  __wait_for_common+0x8f/0x1d0
[  247.068569]  ? __pfx_schedule_timeout+0x10/0x10
[  247.070332]  __flush_workqueue+0x13e/0x3f0
[  247.072011]  xlog_cil_push_now.isra.0+0x25/0x90 [xfs]
[  247.074185]  xlog_cil_force_seq+0x79/0x240 [xfs]
[  247.076253]  ? rpm_suspend+0x1ad/0x5e0
[  247.077831]  ? _raw_spin_unlock+0xe/0x30
[  247.079442]  ? rpm_suspend+0x5b6/0x5e0
[  247.080987]  xfs_log_force+0x7a/0x230 [xfs]
[  247.082764]  xfs_log_worker+0x3d/0xc0 [xfs]
[  247.084564]  process_one_work+0x158/0x360
[  247.086075]  worker_thread+0x235/0x340
[  247.087462]  ? __pfx_worker_thread+0x10/0x10
[  247.088895]  kthread+0xd0/0x100
[  247.090108]  ? __pfx_kthread+0x10/0x10
[  247.091410]  ret_from_fork+0x31/0x50
[  247.092665]  ? __pfx_kthread+0x10/0x10
[  247.093932]  ret_from_fork_asm+0x1a/0x30
[  247.095251]  </TASK>
[  247.096232] INFO: task xfsaild/dm-0:543 blocked for more than 122 seconds.
[  247.099379]       Not tainted 6.11.0-rc5 #2
[  247.101574] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  247.104908] task:xfsaild/dm-0    state:D stack:0     pid:543   tgid:543   ppid:2      flags:0x00004000
[  247.108707] Call Trace:
[  247.110363]  <TASK>
[  247.111914]  __schedule+0x276/0x710
[  247.113892]  schedule+0x27/0xa0
[  247.115756]  schedule_timeout+0x14e/0x160
[  247.117899]  __wait_for_common+0x8f/0x1d0
[  247.119998]  ? __pfx_schedule_timeout+0x10/0x10
[  247.122276]  __flush_workqueue+0x13e/0x3f0
[  247.124457]  ? ttwu_queue_wakelist+0xd0/0xf0
[  247.126684]  xlog_cil_push_now.isra.0+0x25/0x90 [xfs]
[  247.129559]  xlog_cil_force_seq+0x79/0x240 [xfs]
[  247.132238]  ? _raw_spin_unlock+0xe/0x30
[  247.134300]  ? finish_task_switch.isra.0+0x97/0x2a0
[  247.136675]  xfs_log_force+0x7a/0x230 [xfs]
[  247.139251]  xfsaild_push+0x2d7/0x850 [xfs]
[  247.141753]  ? __timer_delete_sync+0x2b/0x40
[  247.143931]  ? schedule_timeout+0x99/0x160
[  247.146023]  ? __pfx_process_timeout+0x10/0x10
[  247.148238]  xfsaild+0xaf/0x140 [xfs]
[  247.150566]  ? __pfx_xfsaild+0x10/0x10 [xfs]
[  247.153062]  kthread+0xd0/0x100
[  247.154785]  ? __pfx_kthread+0x10/0x10
[  247.156715]  ret_from_fork+0x31/0x50
[  247.158404]  ? __pfx_kthread+0x10/0x10
[  247.160147]  ret_from_fork_asm+0x1a/0x30
[  247.161914]  </TASK>
[  247.163238] INFO: task in:imjournal:787 blocked for more than 123 seconds.
[  247.165818]       Not tainted 6.11.0-rc5 #2
[  247.167697] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  247.170408] task:in:imjournal    state:D stack:0     pid:787   tgid:745   ppid:1      flags:0x00000002
[  247.173404] Call Trace:
[  247.174673]  <TASK>
[  247.175844]  __schedule+0x276/0x710
[  247.177441]  schedule+0x27/0xa0
[  247.178888]  schedule_timeout+0x14e/0x160
[  247.180562]  ? update_sd_lb_stats.constprop.0+0x66/0x270
[  247.182469]  ___down_common+0x110/0x170
[  247.183983]  __down_common+0x1e/0xc0
[  247.185435]  down+0x47/0x60
[  247.186684]  xfs_buf_lock+0x31/0xe0 [xfs]
[  247.188553]  xfs_buf_find_lock+0x59/0x110 [xfs]
[  247.190532]  xfs_buf_lookup.constprop.0+0xb9/0x180 [xfs]
[  247.192666]  xfs_buf_get_map+0xea/0x5f0 [xfs]
[  247.194510]  xfs_buf_read_map+0x58/0x2a0 [xfs]
[  247.196315]  ? xfs_btree_read_buf_block+0xa2/0x120 [xfs]
[  247.198282]  ? xfs_trans_add_item+0x37/0xb0 [xfs]
[  247.200066]  xfs_trans_read_buf_map+0x119/0x300 [xfs]
[  247.201813]  ? xfs_btree_read_buf_block+0xa2/0x120 [xfs]
[  247.203618]  xfs_btree_read_buf_block+0xa2/0x120 [xfs]
[  247.205404]  xfs_btree_lookup_get_block.part.0+0x81/0x1b0 [xfs]
[  247.207331]  xfs_btree_lookup+0x102/0x550 [xfs]
[  247.208969]  ? kmem_cache_alloc_noprof+0xdc/0x2b0
[  247.210404]  xfs_dialloc_ag_update_inobt+0x47/0x170 [xfs]
[  247.212187]  ? xfs_inobt_init_cursor+0x67/0xa0 [xfs]
[  247.213808]  xfs_dialloc_ag+0x1ad/0x2e0 [xfs]
[  247.215343]  xfs_dialloc+0x24b/0x3b0 [xfs]
[  247.216798]  xfs_create+0x167/0x490 [xfs]
[  247.218193]  ? xfs_dir2_format+0x4a/0x130 [xfs]
[  247.219644]  xfs_generic_create+0x30d/0x360 [xfs]
[  247.221129]  lookup_open.isra.0+0x4e1/0x600
[  247.222312]  open_last_lookups+0x139/0x440
[  247.223461]  path_openat+0x88/0x290
[  247.224451]  do_filp_open+0xae/0x150
[  247.225454]  do_sys_openat2+0x96/0xd0
[  247.226472]  __x64_sys_openat+0x57/0xa0
[  247.227515]  do_syscall_64+0x64/0x170
[  247.228524]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  247.229778] RIP: 0033:0x7fbcacb3e8d4
[  247.230763] RSP: 002b:00007fbcab7fea40 EFLAGS: 00000293 ORIG_RAX: 0000000000000101
[  247.232427] RAX: ffffffffffffffda RBX: 00007fbc9c000c10 RCX: 00007fbcacb3e8d4
[  247.234030] RDX: 0000000000000241 RSI: 00007fbcab7feb80 RDI: 00000000ffffff9c
[  247.235626] RBP: 00007fbcab7feb80 R08: 0000000000000000 R09: 0000000000000001
[  247.237174] R10: 00000000000001b6 R11: 0000000000000293 R12: 0000000000000241
[  247.238682] R13: 00007fbc9c000c10 R14: 0000000000000001 R15: 00007fbc9c015c40
[  247.240204]  </TASK>
[  247.242323] INFO: task updatedb:769 blocked for more than 123 seconds.
[  247.243697]       Not tainted 6.11.0-rc5 #2
[  247.244694] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  247.246246] task:updatedb        state:D stack:0     pid:769   tgid:769   ppid:756    flags:0x00000002
[  247.247995] Call Trace:
[  247.248689]  <TASK>
[  247.249342]  __schedule+0x276/0x710
[  247.250192]  ? vp_notify+0x16/0x20 [virtio_pci]
[  247.251208]  schedule+0x27/0xa0
[  247.252006]  schedule_timeout+0x14e/0x160
[  247.252926]  ? __blk_flush_plug+0xf5/0x150
[  247.253854]  __wait_for_common+0x8f/0x1d0
[  247.254726]  ? __pfx_schedule_timeout+0x10/0x10
[  247.255678]  xfs_buf_iowait+0x1c/0xc0 [xfs]
[  247.256797]  __xfs_buf_submit+0x131/0x1e0 [xfs]
[  247.257933]  xfs_buf_read_map+0x11e/0x2a0 [xfs]
[  247.259044]  ? xfs_da_read_buf+0xee/0x170 [xfs]
[  247.260178]  xfs_trans_read_buf_map+0x119/0x300 [xfs]
[  247.261306]  ? xfs_da_read_buf+0xee/0x170 [xfs]
[  247.262375]  xfs_da_read_buf+0xee/0x170 [xfs]
[  247.263430]  xfs_dir3_block_read+0x3c/0xf0 [xfs]
[  247.264506]  xfs_dir2_block_lookup_int+0x4c/0x1d0 [xfs]
[  247.265658]  ? xfs_bmap_last_offset+0x6b/0x110 [xfs]
[  247.266737]  xfs_dir2_block_lookup+0x3b/0x130 [xfs]
[  247.267795]  xfs_dir_lookup_args+0x3e/0x90 [xfs]
[  247.268810]  xfs_dir_lookup+0x139/0x160 [xfs]
[  247.269798]  xfs_lookup+0x94/0x160 [xfs]
[  247.270722]  ? avc_has_perm_noaudit+0x6b/0xf0
[  247.271570]  xfs_vn_lookup+0x78/0xb0 [xfs]
[  247.272504]  __lookup_slow+0x81/0x130
[  247.273242]  walk_component+0xe5/0x160
[  247.273969]  path_lookupat+0x6e/0x1c0
[  247.274685]  filename_lookup+0xcd/0x1c0
[  247.275428]  ? selinux_inode_getattr+0x9a/0xc0
[  247.276249]  ? terminate_walk+0x21/0x100
[  247.277002]  ? xfs_vn_getattr+0x66/0x340 [xfs]
[  247.277968]  ? cp_new_stat+0x14f/0x180
[  247.278682]  vfs_statx+0x75/0xe0
[  247.279332]  vfs_fstatat+0x6c/0xb0
[  247.279989]  __do_sys_newfstatat+0x26/0x60
[  247.280722]  ? __seccomp_filter+0x3d8/0x5b0
[  247.281473]  ? syscall_trace_enter+0xff/0x190
[  247.282270]  do_syscall_64+0x64/0x170
[  247.282958]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  247.283805] RIP: 0033:0x7f57f593e13e
[  247.284479] RSP: 002b:00007ffccb8f2bc8 EFLAGS: 00000246 ORIG_RAX: 0000000000000106
[  247.285541] RAX: ffffffffffffffda RBX: 0000564a28d7a9f0 RCX: 00007f57f593e13e
[  247.286569] RDX: 00007ffccb8f2c30 RSI: 0000564a28d88fb9 RDI: 00000000ffffff9c
[  247.287595] RBP: 0000564a28d88fb9 R08: 0000000000000000 R09: 00005649ecdb34e0
[  247.288601] R10: 0000000000000100 R11: 0000000000000246 R12: 0000000000000006
[  247.289599] R13: 00007ffccb8f2e00 R14: 0000000000000006 R15: 0000000000000006

Note :
it was verified this issue was not reproduced if host kernel returned to 6.10.0-rc7.
Comment 1 xuelian.guo@intel.com 2024-09-18 06:02:43 UTC
most KVM usage scenario, KVM guest images are saved on NFS server, this issue cause KVM Guest booting with call trace if KVM image in remote NFS server.
Comment 2 ming4.li 2024-09-19 06:35:13 UTC
Hi NFS maintainers,

I am working with Xuelian.Guo to debug this issue. I found that a VM needs more time to finish bootup on the host with 6.11-rcX kernel.

VM bootup on the host with 6.10 kernel: 15-30s
VM bootup on the host with 6.11-rcX kernel: 5-8mins

I noticed that lots of extra network traffic made during VM bootup with 6.11-rcX host kernel. 

VM bootup on the host with 6.10 kernel:
----system---- ---net/lo-----net/eno0-
     time     | recv  send: recv  send
19-09 10:43:17|           :
19-09 10:43:18|   0     0 :1044k   11k
19-09 10:43:19|   0     0 : 994k   12k
19-09 10:43:20|   0     0 : 560k   46k
19-09 10:43:21|   0     0 : 192B 4225B
19-09 10:43:22|   0     0 : 362B 3884B
19-09 10:43:23|   0     0 : 192B 3266B
19-09 10:43:24|   0     0 : 192B 4034B
19-09 10:43:25|   0     0 :1121k   19k
19-09 10:43:26|   0     0 : 806k 4274B
19-09 10:43:27|   0     0 :  11M   60k
19-09 10:43:28|   0     0 :  40M  222k
19-09 10:43:29|   0     0 :  18M  616k
19-09 10:43:30|   0     0 :7710B  182k
19-09 10:43:31|   0     0 :1558k   56k
19-09 10:43:32|   0     0 :2958k   66k
19-09 10:43:33|   0     0 :3379k   71k
19-09 10:43:34|   0     0 :2215k   35k
19-09 10:43:35|   0     0 :9636k   72k
19-09 10:43:36|   0     0 :  17M  161k
19-09 10:43:37|   0     0 :  12M  106k
19-09 10:43:38|   0     0 :1564k   56k
19-09 10:43:39|   0     0 :  10M  313k
19-09 10:43:40|   0     0 :  15M  114k
19-09 10:43:41|   0     0 :  11M  123k
19-09 10:43:42|   0     0 :  32M  330k
19-09 10:43:43|   0     0 :  37M  410k
19-09 10:43:44|   0     0 :  30M  380k
19-09 10:43:45|   0     0 :  12M  184k
19-09 10:43:46|   0     0 :  52M  365k
19-09 10:43:47|   0     0 :  37M  399k
19-09 10:43:48|   0     0 :  70M  457k
19-09 10:43:49|   0     0 :  34M  282k
19-09 10:43:50|   0     0 :  47M  357k
19-09 10:43:51|   0     0 :  22M  137k
19-09 10:43:52|   0     0 :8174k   56k
19-09 10:43:53|   0     0 :8999k   92k
19-09 10:43:54|   0     0 :2844k   50k
19-09 10:43:55|   0     0 :2073k   90k
----system---- ---net/lo-----net/eno0-
     time     | recv  send: recv  send
19-09 10:43:56|   0     0 : 177k 1427B



VM bootup on the host with 6.11-rcX kernel
----system---- ---net/lo-----net/eno0-
     time     | recv  send: recv  send
19-09 09:28:17|           :
19-09 09:28:18|   0     0 : 505k   95k
19-09 09:28:19|   0     0 : 192B 3682B
19-09 09:28:20|   0     0 :1606B 3768B
19-09 09:28:21|   0     0 :2080B 3848B
19-09 09:28:22|   0     0 :3324B 3448B
19-09 09:28:23|   0     0 :1526k   18k
19-09 09:28:24|   0     0 : 807k 3434B
19-09 09:28:25|   0     0 :1078k 4678B
19-09 09:28:26|   0     0 : 807k 3294B
19-09 09:28:27|   0     0 :1076k 5303B
19-09 09:28:28|   0     0 :6449k   30k
19-09 09:28:29|   0     0 :3753k   18k
19-09 09:28:30|   0     0 :2151k 9234B
19-09 09:28:31|   0     0 :  29M  206k
19-09 09:28:32|   0     0 :  19M  101k
19-09 09:28:33|   0     0 :5654k  889k
19-09 09:28:34|   0     0 :  80k  171k
19-09 09:28:35|   0     0 :3313k   57k
19-09 09:28:36|   0     0 :4548k   53k
19-09 09:28:37|   0     0 :1616k   71k
19-09 09:28:38|   0     0 :4235k   36k
19-09 09:28:39|   0     0 :  14M   93k
19-09 09:28:40|   0     0 :  20M  232k
19-09 09:28:41|   0     0 :3985k  191k
19-09 09:28:42|   0     0 :9599k  235k
19-09 09:28:43|   0     0 :  16M  190k
19-09 09:28:44|   0     0 :  14M  224k
19-09 09:28:45|   0     0 :  43M  585k
19-09 09:28:46|   0     0 : 116M 3122k
19-09 09:28:47|   0     0 : 117M 1844k
19-09 09:28:48|   0     0 : 117M 1228k
19-09 09:28:49|   0     0 : 117M  313k
19-09 09:28:50|   0     0 : 117M  317k
19-09 09:28:51|   0     0 : 117M  310k
19-09 09:28:52|   0     0 : 117M  310k
19-09 09:28:53|   0     0 : 117M  308k
19-09 09:28:54|   0     0 : 117M  318k
19-09 09:28:55|   0     0 : 117M  311k
----system---- ---net/lo-----net/eno0-
     time     | recv  send: recv  send
19-09 09:28:56|   0     0 : 117M  321k
19-09 09:28:57|   0     0 : 117M  320k
19-09 09:28:58|   0     0 : 117M  321k
19-09 09:28:59|   0     0 : 117M  314k
19-09 09:29:00|   0     0 : 117M  313k
19-09 09:29:01|   0     0 : 117M  315k
19-09 09:29:02|   0     0 : 117M  313k
19-09 09:29:03|   0     0 : 117M  311k
19-09 09:29:04|   0     0 : 117M  320k
19-09 09:29:05|   0     0 : 117M  315k
19-09 09:29:06|   0     0 : 117M  315k
19-09 09:29:07|   0     0 : 117M  321k
19-09 09:29:08|   0     0 : 117M  324k
19-09 09:29:09|   0     0 : 117M  327k
19-09 09:29:10|   0     0 : 117M  328k
19-09 09:29:11|   0     0 : 117M  324k
19-09 09:29:12|   0     0 : 117M  324k
19-09 09:29:13|   0     0 : 117M  321k
19-09 09:29:14|   0     0 : 117M  324k
19-09 09:29:15|   0     0 : 117M  315k
19-09 09:29:16|   0     0 : 117M  318k
19-09 09:29:17|   0     0 : 117M  324k
19-09 09:29:18|   0     0 : 117M  323k
19-09 09:29:19|   0     0 : 117M  323k
19-09 09:29:20|   0     0 : 117M  321k
19-09 09:29:21|   0     0 : 117M  323k
19-09 09:29:22|   0     0 : 117M  317k
19-09 09:29:23|   0     0 : 117M  323k
19-09 09:29:24|   0     0 : 117M  323k
19-09 09:29:25|   0     0 : 117M  329k
19-09 09:29:26|   0     0 : 117M  319k
19-09 09:29:27|   0     0 : 117M  319k
19-09 09:29:28|   0     0 : 117M  320k
19-09 09:29:29|   0     0 : 117M  323k
19-09 09:29:30|   0     0 : 117M  315k
19-09 09:29:31|   0     0 : 117M  328k
19-09 09:29:32|   0     0 : 117M  330k
[...]
19-09 09:34:24|   0     0 : 117M  295k
19-09 09:34:25|   0     0 : 117M  280k
19-09 09:34:26|   0     0 : 117M  309k
19-09 09:34:27|   0     0 : 117M  332k
19-09 09:34:28|   0     0 : 117M  328k
19-09 09:34:29|   0     0 : 117M  327k
19-09 09:34:30|   0     0 : 117M  329k
19-09 09:34:31|   0     0 : 117M  304k
19-09 09:34:32|   0     0 : 117M  316k
19-09 09:34:33|   0     0 : 117M  299k
19-09 09:34:34|   0     0 : 117M  282k
19-09 09:34:35|   0     0 : 116M  287k
19-09 09:34:36|   0     0 :  99M  275k
19-09 09:34:37|   0     0 :  70M  199k
19-09 09:34:38|   0     0 :  79M  223k
19-09 09:34:39|   0     0 :  79M  221k
19-09 09:34:40|   0     0 :  80M  235k
19-09 09:34:41|   0     0 :  80M  223k
19-09 09:34:42|   0     0 :  80M  217k
19-09 09:34:43|   0     0 :  71M  633k
19-09 09:34:44|   0     0 :  77M  217k
19-09 09:34:45|   0     0 :  71M  201k
19-09 09:34:46|   0     0 :  67M  199k
----system---- ---net/lo-----net/eno0-
     time     | recv  send: recv  send
19-09 09:34:47|   0     0 :  56M  492k
19-09 09:34:48|   0     0 :  63M  358k
19-09 09:34:49|   0     0 :  60M  302k
19-09 09:34:50|   0     0 :  72M  242k
19-09 09:34:51|   0     0 :  61M  448k
19-09 09:34:52|   0     0 :  60M  409k
19-09 09:34:53|   0     0 :  64M  210k
19-09 09:34:54|   0     0 :  75M  224k
19-09 09:34:55|   0     0 :  72M  268k
19-09 09:34:56|   0     0 :  29M  164k
19-09 09:34:57|   0     0 :  14M  123k
19-09 09:34:58|   0     0 :  21M  159k
19-09 09:34:59|   0     0 :  17M  121k
19-09 09:35:00|   0     0 :4276k   43k
19-09 09:35:01|   0     0 :3635k   45k
19-09 09:35:02|   0     0 :4158k   44k
19-09 09:35:03|   0     0 :3258k   34k
19-09 09:35:04|   0     0 :4735k   49k
19-09 09:35:05|   0     0 :5461k   48k
19-09 09:35:06|   0     0 :3232k   35k
19-09 09:35:07|   0     0 :3592k   47k
19-09 09:35:08|   0     0 :6994k   63k
19-09 09:35:09|   0     0 :  13M   81k
19-09 09:35:10|   0     0 :7027k   49k
19-09 09:35:11|   0     0 :3619k   33k
19-09 09:35:12|   0     0 :6174k   54k
19-09 09:35:13|   0     0 :5848k   51k
19-09 09:35:14|   0     0 :5758k   48k
19-09 09:35:15|   0     0 :5561k   45k
19-09 09:35:16|   0     0 :4929k   43k
19-09 09:35:17|   0     0 :2289k   35k
19-09 09:35:18|   0     0 :6416k   71k
19-09 09:35:19|   0     0 :3861k   44k
19-09 09:35:20|   0     0 :4425k   43k
19-09 09:35:21|   0     0 :3406k   38k
19-09 09:35:22|   0     0 :2691k   34k
19-09 09:35:23|   0     0 :1491k   33k
19-09 09:35:24|   0     0 :2038k   35k
19-09 09:35:25|   0     0 :5195k   46k
----system---- ---net/lo-----net/eno0-
     time     | recv  send: recv  send
19-09 09:35:26|   0     0 :3397k   38k
19-09 09:35:27|   0     0 :4983k   53k
19-09 09:35:28|   0     0 :7727k   78k
19-09 09:35:29|   0     0 :9750k  126k
19-09 09:35:30|   0     0 :8484k   95k
19-09 09:35:31|   0     0 :4493k  129k
19-09 09:35:32|   0     0 :6160k   41k
19-09 09:35:33|   0     0 :7808k   44k
19-09 09:35:34|   0     0 :8776k   57k
19-09 09:35:35|   0     0 :  17M   58k
19-09 09:35:36|   0     0 :5701k   33k
19-09 09:35:37|   0     0 :6359k   38k
19-09 09:35:38|   0     0 :7414k   46k
19-09 09:35:39|   0     0 :8863k   55k
19-09 09:35:40|   0     0 :6184k   30k
19-09 09:35:41|   0     0 :6919k   34k
19-09 09:35:42|   0     0 :7661k   45k
19-09 09:35:43|   0     0 :9455k   47k
19-09 09:35:44|   0     0 :8072k   46k
19-09 09:35:45|   0     0 :  32M   89k
19-09 09:35:46|   0     0 :4737k   24k
19-09 09:35:47|   0     0 :6952k   48k
19-09 09:35:48|   0     0 :7279k   80k
19-09 09:35:49|   0     0 :9034k   73k
19-09 09:35:50|   0     0 :  28M  106k
19-09 09:35:51|   0     0 :  19M   74k
19-09 09:35:52|   0     0 :  13M   91k
19-09 09:35:53|   0     0 :  11M   95k
19-09 09:35:54|   0     0 :  11M   81k
19-09 09:35:55|   0     0 :  11M   56k
19-09 09:35:56|   0     0 :  10M   78k
19-09 09:35:57|   0     0 :  16M   79k
19-09 09:35:58|   0     0 :  18M   80k
19-09 09:35:59|   0     0 :  11M   93k
19-09 09:36:00|   0     0 :3574k   24k
19-09 09:36:01|   0     0 :1802k   21k
19-09 09:36:02|   0     0 : 723k   15k
19-09 09:36:03|   0     0 :1601k   22k
19-09 09:36:04|   0     0 :1247k   18k
----system---- ---net/lo-----net/eno0-
     time     | recv  send: recv  send
19-09 09:36:05|   0     0 :1646k   22k
19-09 09:36:06|   0     0 :1820k   27k^C


We can see all network bandwidth(117M) had been used from 09:28:47 to 09:34:34.
That is why we can see some xfs call trace logs in guest's dmesg, because some xfs IOs cannot be finished on time, so guest thought something wrong happened in xfs, but it is not, the guest can be booted up finally without any error.

I guess those extra network traffic came from nfs, so I tried to revert some patches, and found that the issue is gone if I reverted below patch.

commit 49b29a573da83b65d5f4ecf2db6619bab7aa910c
Author: Christoph Hellwig <hch@lst.de>
Date:   Mon May 27 18:36:09 2024 +0200
 
    nfs: add support for large folios
 
    NFS already is void of folio size assumption, so just pass the chunk size
    to __filemap_get_folio and set the large folio address_space flag for all
    regular files.
 
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Tested-by: Sagi Grimberg <sagi@grimberg.me>
    Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>


I am not an expert of NFS, not sure why folio support causes this problem. could you please take a look at it? Thanks.
Comment 3 xuelian.guo@intel.com 2024-12-02 06:50:59 UTC
@Christoph Hellwig, 

could you check this issue? we double checked this issue is involved by this patch:
commit 49b29a573da83b65d5f4ecf2db6619bab7aa910c
Author: Christoph Hellwig <hch@lst.de>
Date:   Mon May 27 18:36:09 2024 +0200
 
    nfs: add support for large folios
Comment 4 Chris Egolf 2024-12-11 04:20:01 UTC
I think this thread in the Kernel mailing list may be related.

https://lore.kernel.org/all/49648605-d800-4859-be49-624bbe60519d@gmail.com/

I've been able to revert the patch mentioned above to avoid the same symptoms with VM hosted on NFS shares with 6.11 and recent 6.12 kernels.

Note You need to log in before you can comment on or make changes to this bug.