Bug 117121 - performance problem file transfer nfs to nfs/system unusable
Summary: performance problem file transfer nfs to nfs/system unusable
Status: NEW
Alias: None
Product: File System
Classification: Unclassified
Component: ext3 (show other bugs)
Hardware: i386 Linux
: P1 normal
Assignee: Trond Myklebust
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-04-24 17:21 UTC by Dieter Ferdinand
Modified: 2019-02-18 09:39 UTC (History)
1 user (show)

See Also:
Kernel Version: 4.9.xx
Subsystem:
Regression: No
Bisected commit-id:


Attachments
kernel config for kernel 4.9.22 with the bug (132.84 KB, text/x-mpsub)
2017-06-30 02:09 UTC, Dieter Ferdinand
Details
kernel config for kernel 4.9.35 with the bug (132.88 KB, text/x-mpsub)
2017-06-30 02:10 UTC, Dieter Ferdinand
Details
kernel config for kernel 4.9.23 without the bug (133.03 KB, text/x-mpsub)
2017-06-30 02:12 UTC, Dieter Ferdinand
Details

Description Dieter Ferdinand 2016-04-24 17:21:19 UTC
hello,
i have a big problem with my system (8-core, 16 GB ram).

i try to sync one nas to an other nas with rsync and nfs-mounts.
i have no problems with the slow sync-speed.

but after some time after the start of the synchronisation, the system gets very slow: write-performance to harddisk less then 3 MByte/s and other problems.

after i have the problem a second time (first, i think, i use cifs-mounts for sync), i try to find out, what happend.

after stopping rsync and run sync (which runs over 60 seconds) to clear the write-back-cache, the system is ok again.

at the moment, i use my old server (dual-core, 4 GB ram) to snyc the two nas-systems with rsync and nfs-mounts under kernel 2.6.32.7. on this system, i have no performance problem while rsync is running to sync the nas-systems.

is it possible, that the slow write-speed to the nas and the 16 GB ram makes this problem ?

goodby
Comment 1 bfields 2016-04-26 13:22:38 UTC
So you've seen similar problems on both NFS and CIFS?

That and the fact that it recovers after sync might suggest some sort of problem with generic writeback logic.

Wonder if it would be worth fooling with vm parameters (e.g., dirty_bytes, dirty_background_bytes?)
Comment 2 Dieter Ferdinand 2016-04-26 15:23:46 UTC
hello,
first time, i have this problem, i reboot my server. at this time i have a second problem with cifs. the cifs-server (nas) have crashed and i must reboot it. after that, all processes which have accessed this mount can't be killed and i can't umount this connection to mount it again. this problem was the reason, why automount don't work because a mount-command hangs.

this and the very slow reaction of the server are the reasons to reboot my server.

after reboot, i used nfs instead of cifs to access my nas because with hanging nfs-mount, i know how i can resolve this problem without reboot.

i restart the data-sync with rsync (or mc) again and after some time, my system gets very slow again, so i try to find a reason for this.

symtoms: very slow hd-access and write-rate less then 3 MByte/s. i don't test read-speed alone. i use a tool to protocol hd-transferrate and use unrar or unzip to extract a archive. normaly, i get over 20 MB/s for this. but at this time, very very slow.
my tool show also the dirty buffers for the hd, but there are no dirty buffers at this time. with this tool i have detected harddisk-problems while one of my drives was slow down and must be replaced. in this case, the dirty buffers are over 100 for some time.

because i have some problems in the past with write-back-caches (most time with windows, but also with dos, if much data is written), i stop my data-synchronisation and run sync, i think, it runs over 60s. i don't stop it.

after that, my system ist fast again and i run the data-sync on a system, which is used for backup primery, so it makes no problem, when the system slow down for the synchronisation time.

i can't get more information to you and i don't have a second system with 16 GB ram to make more tests and i can't use this system for tests. last time, i update my kernel, a new bug in the xfs-filesystem crash my system over five time in a week, so i must repair my filesystems more than one time and reboot it with the kernel, which i used before i update the system over one year without problems.

i think, that this problem don't happens on systems with less ram, because on my second linux-server with kernel 2.6.x i have no problems after i start the data-synchronisation.

goodby
Comment 3 Dieter Ferdinand 2016-04-27 15:08:29 UTC
hello,
it seems, that this bug only aktiv, if i copy from nfs-mount to nfs-mount.

if i copy from hd to nfs-mount, i have no problems.

goodby
Comment 4 Dieter Ferdinand 2016-04-28 06:02:52 UTC
hello,
i must correct me. if i copy from hd to nfs, the system ist slow, if there are some other processes, which transfer much data. but i don't know, which of two action makes the system slow. copy from hd to hd or from netzwork (cifs) to hd.

goodby
Comment 5 Dieter Ferdinand 2017-06-30 02:09:24 UTC
Created attachment 257235 [details]
kernel config for kernel 4.9.22 with the bug
Comment 6 Dieter Ferdinand 2017-06-30 02:10:05 UTC
Created attachment 257237 [details]
kernel config for kernel 4.9.35 with the bug
Comment 7 Dieter Ferdinand 2017-06-30 02:12:06 UTC
Created attachment 257239 [details]
kernel config for kernel 4.9.23 without the bug

Note You need to log in before you can comment on or make changes to this bug.