Bug 14830
Summary: | When other IO is running sync times go to 10 to 20 minutes | ||
---|---|---|---|
Product: | File System | Reporter: | Michael Godfrey (godfrey) |
Component: | ext4 | Assignee: | fs_ext4 (fs_ext4) |
Status: | CLOSED OBSOLETE | ||
Severity: | normal | CC: | agk, alan, david, haircut, jack, jmaggard10, linux-kernel-bugs, neil.broomfield, rhuddusa, sandeen, tytso |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.31.6-166.fc12.x86_64 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
descrition of system and excerps from /var/log/messages
Requested iostat.log file messages output with sync running Fix long sync times during heavy writing additional data from kernel 2.6.32.9-67.fc12.x86_64 Patch to stop background writeback when other work is queued for the thread Patch to stop background writeback when other work is queued for the thread Patch to stop background writeback when other work is queued for the thread |
filesystem or nfs perhaps? There are two issues here - the first is that sync takes long - this is at the level of "don't do it when it hurts" kind of thing ;). So when you do heavy writing and call sync, it simply takes long time to flush all the caches to disk. If you think the time is inappropriately long, we can have a look at it but for that we'd need much more details like amount and nature of data writen (many small files vs a few large ones), time it takes sync to complete, speed of disks for sequential IO... The second issue is that nfsd blocks as well. Partly this might be because sync blocks writers (so that it can get it's work done in a finite time), partly it might be a limitation of ext4 because all metadata writes go through a journal which has a limited size and thus we have to copy data from the journal to final locations on disk once in a while and that usually leads to all writer processes blocking waiting for the journal space to be freed (which can take a longer time when 'sync' process is making disk busy with data writes). Note in the log file: This problem prevents production use of systems using this kernel. evokes a question: Do you have a kernel which behaved better for you? Which one? >This problem prevents production use of systems using this kernel. >evokes a question: Do you have a kernel which behaved better for you? Which >one? Yes. RHEL5.4 does not show this problem. It is the production system that works in this environment. The response above is disappointing. Is sync response of 20 minutes, including several task timeouts to be considered "normal?" >If you think the time is inappropriately long, we can have a look at it >but for that we'd need much more details like amount and nature of data writen >(many small files vs a few large ones), time it takes sync to complete, speed >of disks for sequential IO... I am sorry to have to tell you that in this environment we do not deal in exclusively small or large files, we actually have quite a few of both. When an rsync which transfers about 50GB of files of various sizes is running, the hung condition is continuous until the rsync completes. This is just a pretty typical load. You could try it yourself. No special sizes of files are required. I think I mentioned that the ext4 LVM is a RAID 50 3ware 9650SE-8LPML, with 8 2T drives. Its throughput for reading and writing is good when the system is not locked up. Reply-To: cslee-list@cybericom.co.uk bugzilla-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=14830 > > > > > > --- Comment #3 from Michael Godfrey <godfrey@isl.stanford.edu> 2010-01-18 > 23:58:09 --- > >> This problem prevents production use of systems using this kernel. >> > > >> evokes a question: Do you have a kernel which behaved better for you? Which >> one? >> > > Yes. RHEL5.4 does not show this problem. It is the production > system that works in this environment. > > The response above is disappointing. Is sync response of 20 minutes, > including several task timeouts to be considered "normal?" > > >> If you think the time is inappropriately long, we can have a look at it >> but for that we'd need much more details like amount and nature of data >> writen >> (many small files vs a few large ones), time it takes sync to complete, >> speed >> of disks for sequential IO... >> > > I am sorry to have to tell you that in this environment we do not > deal in exclusively small or large files, we actually have quite a > few of both. When an rsync which transfers about 50GB of files of > various sizes is running, the hung condition is continuous until the rsync > completes. This is just a pretty typical load. You could try it > yourself. No special sizes of files are required. I think I > mentioned that the ext4 LVM is a RAID 50 3ware 9650SE-8LPML, > with 8 2T drives. Its throughput for reading and writing is good > when the system is not locked up. > > Is it possible that it is something allong the lines of what is described at this link: http://notemagnet.blogspot.com/2008/08/linux-write-cache-mystery.html If so a runtime adjustment might help you out. Chris. So the Linux write cache mystery is unlikely to solve the problem since that was talking about backporting some tuning parameters from 2.6.22 to the RHEL/CentOS 2.6.18 kernel, and here the problem seems to be that the sync is taking much longer on a 2.6.31 FC 12 kernel. So the first thing I notice is the fact that you have the nodelalloc mount option enabled. Any particular reason why you did that? Try removing that; one of the reasons why ext4 is generally described as being much better than ext3 with respect to this problem (of the machine becoming unresponsive during a sync) is because of delayed allocation, and you've turned it off. So try removing nodelalloc and see if that makes the performance come back. Another thing that might be worth testing is to see whether an ext3 filesystem on a 2.6.31 FC 12 kernel behaves any differently. This may be something that is some kind of VM tuning issue between RHEL 5.4 and a modern kernel; I don't know how many people try running Fedora 12 on a system with large amounts of memory and an NFS load, and maybe there is some kind of tuning issue that has been exposed. So that's a quick experiment that's worth doing just so we can figure out where we need to concentrate our diagnostic efforts. Another thing to try is to do some instrumentation using iostat to see what the system is doing, before, during, and after the sync command. >So the first thing I notice is the fact that you have the nodelalloc mount >option enabled. Any particular reason why you did that? This was required due to an error in 2.6.30. It is possible that it is no longer needed. I will check. It is not really feasible to revert to ext3 other than going back to RHEL5 which is what has been done. One of the main purposes of using FC11 was ext4. fsck under ext3 is about 8 hours, under ext4 about 9 minutes. And, ext4 has other well-advertised advantages. >Another thing to try is to do some instrumentation using iostat to see what >the >system is doing, before, during, and after the sync command. I have tried this to some extent. It is not too easy when response is extremely slow. I tried the advice above about nodelalloc using a newer kernel. It had no observable effect, but did not cause problems as it had before. So, I left it turned off. And, I installed kernel 2.6.31.12-174.2.3.fc12.x86_64. None of this helped. sync times still at 20 minutes during rsync, and task (sync or rsync) timeouts in /var/log/messages. I guess that patience is called for here. Hmm, can you run "iostat 1 | tee iostat.log &", and while that is running, wait for a 15 seconds or so we can capture what things like in steady state, and then type "sync", and note when the sync command was initiated in the iostat.log file? It would be useful seeing what this looks like on both your ext3 production server and on the ext4 test server. Do you know if there are any other differences between the two systems, in terms of the workload seen by your production server versus your test server? Another thing that would be very useful to do is to enable ftrace, and then cd to /sys/kernel/debug/tracing. (This assumes that you have debugfs mounted on /sys/kernel/debug.) Then "echo 1 > events/jbd2/jbd2_run_stats/enable" and then in a similar fashion, do "cat trace_pipe | tee /tmp/trace.output", wait for four or five data samples from your file system of interest, and then issue the sync command, and let's see what is happening. Created attachment 24705 [details] Requested iostat.log file As requested by: Theodore Tso <tytso@mit.edu> Created attachment 24706 [details] messages output with sync running Additional information for Theodore Tso <tytso@mit.edu> I have attached the iostat.log and log/messages output. sync was started after about 4 cycles of iostat. This was run on the FC12 ext4 system. No testing can be done on the production system. After kill -9 of the sync run it took about 20 minutes before it died. Michael On Wed, Jan 27, 2010 at 02:06:25PM +0100, Andre Noll wrote:
> On 11:19, bugzilla-daemon@bugzilla.kernel.org wrote:
> > After kill -9 of the sync run it took about 20 minutes before
> > it died.
>
> I was seeing similar behaviour on one of our servers, and changing
> the io scheduler to noop fixed things for me. So it seems to be an
> issue with cfq which is somehow triggered by ext4 but not by ext3.
>
> To change the IO scheduler, just execute
>
> echo noop > /sys/block/sda/queue/scheduler
>
> (replace sda if necessary).
Andre or Michael. If switching away from cfq helps, that's
definitely... interesting. Given that cfq is the default scheduler, I
definitely want to understand what might be going on here. Are either
if you able to run blktrace so we can get a sense of what is going on
under the cfq and deadline/noop I/O schedulers?
And in both of your cases, were you using a new file system freshly
created using mke2fs -t ext4, or was this a ext2/ext3 filesystem that
was converted for use under ext4?
Thanks,
- Ted
Reply-To: maan@systemlinux.org On 02:53, tytso@mit.edu wrote: > On Wed, Jan 27, 2010 at 02:06:25PM +0100, Andre Noll wrote: > > On 11:19, bugzilla-daemon@bugzilla.kernel.org wrote: > > > After kill -9 of the sync run it took about 20 minutes before > > > it died. > > > > I was seeing similar behaviour on one of our servers, and changing > > the io scheduler to noop fixed things for me. So it seems to be an > > issue with cfq which is somehow triggered by ext4 but not by ext3. > > > > To change the IO scheduler, just execute > > > > echo noop > /sys/block/sda/queue/scheduler > > > > (replace sda if necessary). > > Andre or Michael. If switching away from cfq helps, that's > definitely... interesting. Given that cfq is the default scheduler, I > definitely want to understand what might be going on here. Are either > if you able to run blktrace so we can get a sense of what is going on > under the cfq and deadline/noop I/O schedulers? Yes, I can use that machine freely for testing purposes, including reboots. It is just our fallback server which creates hardlink-based snapshots using rsync. However, I have to recompile the kernel to include debugfs which is needed by blktrace and I'd like to wait until the currently running rsync completes before rebooting. Would you like to see the output of btrace /dev/mapper/... or should I use more sophisticated command line options? > And in both of your cases, were you using a new file system freshly > created using mke2fs -t ext4, or was this a ext2/ext3 filesystem that > was converted for use under ext4? The ext4 file system was created from scratch using -O dir_index,uninit_bg,extent, a block size of 4096 and 32768 bytes per inode. Thanks Andre -- The only person who always got his work done by Friday was Robinson Crusoe On 01/27/2010 11:53 PM, tytso@mit.edu wrote: > And in both of your cases, were you using a new file system freshly > created using mke2fs -t ext4, or was this a ext2/ext3 filesystem that > was converted for use under ext4? > > Thanks, > > - Ted > In my case it is new ext4. I will try what tests I can in a day or two. The systems are busy right now. Michael > In my case it is new ext4. I will try what tests I can in a day or
> two. The systems are busy right now.
Well, I'm flying back from New Zealand so I'll be off the grid mostly
until Sunday or Monday.
Thanks for your efforts!
- Ted
I had a chance to try noop and deadline: With the amount of testing that I could do they seemed to behave about the same, and: 1. the task timeouts are gone. 2. sync, which used to take over 20 minutes is consistently about 10 minutes. I did retest by setting scheduler back to cfg and the sync time went back to 24 minutes. 3. I also tried 2 variations: 1. rsync from source to target: rsync machinea:/aaa bbb 2. rsync of an NFS-mounted filesystem to local filesystem. (i.e the rsync thought it was local: rsync aaa/ bbb, but aaa was NFS-mounted (NFS3). These two variation behaved just about identically. I have left the system set to deadline, as recommended. With this setting (and with noop) things like du seem quicker too. So, this is good, but 10 minutes for sync is still way too long. With these systems, I cannot compile debug kernels or stuff like that. They are in use for backup. Any other suggestions? Michael I have now set deadline for all tests, and I had a chance to try a system just like the one I have been using, but with an 8.1T ext4 partition instead of the 11T partition before. I used the same load: rsync /aaa /bbb, where /aaa was NFS-mounted. The behavior was quite similar, but the longest sync time was about 4 minutes. This suggests that the size of the filesystem is an important factor. Increasing the filesystem size by 36% more than doubled the sync time. Th size of the database in the 2 systems was somewhat larger in the 11T system, but part of that was unused in the transfer. Michael Created attachment 25050 [details]
Fix long sync times during heavy writing
Last week, I've noticed a bug in writeback code causing sync to take longer than necessary. This patch should fix it. Is it possible for you to try it? Thanks.
On 02/15/2010 06:28 AM, bugzilla-daemon@bugzilla.kernel.org wrote: > Last week, I've noticed a bug in writeback code causing sync to take longer > than necessary. This patch should fix it. Is it possible for you to try it? > Thanks. > I have been following this. But right now I am leaving on a trip. And, I cannot right now build kernels for the system where we get the problem. These systems are in production use for backup. Sorry that I cannot help more. Michael Created attachment 25447 [details]
additional data from kernel 2.6.32.9-67.fc12.x86_64
Again, I have had very limited time, but I ran an
rsync after update to kernel 2.6.32.9-67.fc12.x86_64.
The time for sync to complete was about as before,
(about 10 minutes) and timeouts occurred as shown in the
attached log entries.
I assume that the patches are not included in this
kernel, but I just thought that I should report this test.
Michael
Michael, you're right the patch is not in f12 (not in upstream yet, either). I tried another test with 2.6.32.10-90.fc12.x86_64. I did not expect an improvement. But, the results were actually a lot worse. After starting an rsync which transferred a few 100GB through NFS, I started a sync using time sync. This caused a number of the usual 2 minute timeout messages. But, also it did not close until about 20 minutes after the rsync had completed. All together it ran for several hours. By the way it was not possible to kill the sync using kill -9. This is clearly hopeless. Will anything be done about this in 2.6.33 for fc13? Will the fact that Google is going with ext4 possibly help? Michael The patch in comment #18 is still not upstream. Jan, what's the status of that? Michael, I did discover one issue upstream related to fsync, see: http://marc.info/?l=linux-ext4&m=126987658403214&w=2 This was very inefficient scanning of large files for sync. However, for sys_sync, I didn't see the problem because the loop was limited in that case, so it may not be related. (In reply to comment #3) > >This problem prevents production use of systems using this kernel. > > >evokes a question: Do you have a kernel which behaved better for you? Which > >one? > > Yes. RHEL5.4 does not show this problem. It is the production > system that works in this environment. RHEL5.4 on ext3 or ext4? > The response above is disappointing. Is sync response of 20 minutes, > including several task timeouts to be considered "normal?" Probably not, but it really depends. If you have a system with massive amounts of memory, and a slow path to the disk, then sure, if you have to flush many many gigabytes it will be slow. But that's extreme, and I don't think you're in that case. You do have a 12G box though, so that's potentially a lot of memory to flush. OTOH your storage should probably be reasonably fast. It does seem like something else is going on here. (In reply to comment #22) > I tried another test with 2.6.32.10-90.fc12.x86_64. I did > not expect an improvement. But, the results were actually > a lot worse. After starting an rsync which transferred a > few 100GB through NFS, I started a sync using time sync. > This caused a number of the usual 2 minute timeout messages. But, also > it did not close until about 20 minutes after the rsync had > completed. All together it ran for several hours. By the > way it was not possible to kill the sync using kill -9. > > This is clearly hopeless. Hm, don't give up quite yet ;) Can you describe this test a little more explicitly; which box was the nfs server vs. client, which boxes were the rsync servers/clients, which box ran sync? I just don't want to make wrong assumptions in trying to recreate this. > Will anything be done about this in 2.6.33 for fc13? we still have to get to the bottom of the problem before we can talk about fixes, I'm afraid. > Will the fact that Google is going with ext4 possibly help? I don't think so. One thing that may be interesting is to run blktrace (or use seekwatcher to do that for you) during the sync call that is stalling out, to get an idea of what is happening at the block layer and when. For what it's worth, assuming I have replicated the behavior properly, the long-running sync doesn't seem unique to ext4 at all. I can replicate it by running a script which creates 4G files in sequence, putting it in the background, sleeping for a while, and typing "sync" - which never returns. I see the same behavior on ext4 as well as xfs and ext3. I applied Jan's patch from comment #18, and the behavior is unchanged. As I'm reading through wb_writeback, it could happen that the flushing thread gets stuck in background flushing and thus we never gets to processing the work for sync(1) and thus sync(1) never finishes. The attached patch should fix that. Eric, could you please test whether this patch together with the patch from comment 18 fixes your testcase? Created attachment 25882 [details]
Patch to stop background writeback when other work is queued for the thread
Jan, will do after lunch. Thanks for looking into this! -Eric Created attachment 25883 [details]
Patch to stop background writeback when other work is queued for the thread
Oops, attached a wrong patch. This is the right one.
A quick test on ext3 looks good; ext4 still seems to run away on sync :( But I need to be a bit more methodical, and test a few more filesystems; will let you know. I'm not familiar enough with all the new writeback code; does this mean that a sync will return as soon as any new IO is queued post-sync? That seems odd if so - but maybe I misunderstand. Thanks, -Eric Hm maybe I spoke too soon, had a couple runs on ext3 that looked good but now it's been syncing for many minutes ... (this is the test where I create 4G files in a loop, let it go for a while, then time sync - on a 16g box) (In reply to comment #29) > I'm not familiar enough with all the new writeback code; does this mean that > a > sync will return as soon as any new IO is queued post-sync? That seems odd > if > so - but maybe I misunderstand. No. The patch means that writeback thread stops doing pdflush-style writeback when it sees new work queued - work does not mean IO. It means that someone asks writeback thread to do some kind of writeout... Now I'm still not sure my patch is the right approach to the problem but I just wanted to checkout whether it at least solves the problem of this particular workload. If you still see the problem (only less often) even with ext3, then we probably have also some other work that is livelockable and thus we never get to work submitted by sync(1). I guess I'll have to find a machine with enough disks and memory to try this out... I'm doing this on a machine with a single spare disk, dedicated to the test, so disks aren't an issue. I haven't tried to replicate with less memory. I'll try to poke at it a bit more today... -Eric I think I might be seeing the same/similar issue, see "dmesg | tail" below, it seams fairly repeatable and occurs usually when I move a file from my download drive to my raid array 5 (5 disk, 6TB), I'm running Ubuntu 9.10 with all the latest official patches. [301080.930044] INFO: task kjournald2:1583 blocked for more than 120 seconds. [301080.930048] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [301080.930051] kjournald2 D ffff88007734e8e0 0 1583 2 0x00000000 [301080.930056] ffff88006c38dd10 0000000000000046 ffff88005695d850 0000000000015880 [301080.930060] ffff880070d247c0 0000000000015880 0000000000015880 0000000000015880 [301080.930063] 0000000000015880 ffff880070d247c0 0000000000015880 0000000000015880 [301080.930066] Call Trace: [301080.930075] [<ffffffff811f096a>] jbd2_journal_commit_transaction+0x1aa/0x1120 [301080.930080] [<ffffffff8127c336>] ? rb_erase+0xd6/0x160 [301080.930085] [<ffffffff81010785>] ? __switch_to+0x1e5/0x370 [301080.930088] [<ffffffff8104f075>] ? finish_task_switch+0x65/0x120 [301080.930093] [<ffffffff8152c98a>] ? _spin_lock_irqsave+0x2a/0x40 [301080.930097] [<ffffffff8106af57>] ? lock_timer_base+0x37/0x70 [301080.930101] [<ffffffff81078a30>] ? autoremove_wake_function+0x0/0x40 [301080.930105] [<ffffffff811f6963>] kjournald2+0x103/0x270 [301080.930108] [<ffffffff81078a30>] ? autoremove_wake_function+0x0/0x40 [301080.930111] [<ffffffff811f6860>] ? kjournald2+0x0/0x270 [301080.930114] [<ffffffff81078646>] kthread+0xa6/0xb0 [301080.930117] [<ffffffff8101316a>] child_rip+0xa/0x20 [301080.930120] [<ffffffff810785a0>] ? kthread+0x0/0xb0 [301080.930123] [<ffffffff81013160>] ? child_rip+0x0/0x20 [301080.930145] INFO: task mv:2005 blocked for more than 120 seconds. [301080.930146] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [301080.930148] mv D ffff880079705928 0 2005 2464 0x00000000 [301080.930152] ffff8800210f9868 0000000000000086 ffff8800210f97e8 0000000000015880 [301080.930155] ffff880003b747c0 0000000000015880 0000000000015880 0000000000015880 [301080.930159] 0000000000015880 ffff880003b747c0 0000000000015880 0000000000015880 [301080.930162] Call Trace: [301080.930165] [<ffffffff8152c87d>] __down_read+0x8d/0xc6 [301080.930169] [<ffffffff8152bc49>] down_read+0x19/0x20 [301080.930172] [<ffffffff811b5fe2>] ext4_get_blocks+0x52/0x210 [301080.930175] [<ffffffff811b62e7>] ext4_da_get_block_prep+0x77/0x100 [301080.930179] [<ffffffff81148663>] __block_prepare_write+0x1c3/0x560 [301080.930182] [<ffffffff811b6270>] ? ext4_da_get_block_prep+0x0/0x100 [301080.930185] [<ffffffff81148b9f>] block_write_begin+0x5f/0x100 [301080.930188] [<ffffffff811b8a0d>] ext4_da_write_begin+0x12d/0x260 [301080.930191] [<ffffffff811b6270>] ? ext4_da_get_block_prep+0x0/0x100 [301080.930194] [<ffffffff8104f075>] ? finish_task_switch+0x65/0x120 [301080.930198] [<ffffffff810da1a2>] generic_perform_write+0xb2/0x1d0 [301080.930202] [<ffffffff811e4dbb>] ? ext4_xattr_get+0x5b/0x90 [301080.930206] [<ffffffff810dafd3>] generic_file_buffered_write+0x83/0x140 [301080.930209] [<ffffffff810dc950>] __generic_file_aio_write_nolock+0x240/0x470 [301080.930213] [<ffffffff811348f3>] ? touch_atime+0x33/0x150 [301080.930216] [<ffffffff810dcca0>] generic_file_aio_write+0x70/0xf0 [301080.930221] [<ffffffff811ae7f9>] ext4_file_write+0x49/0x160 [301080.930225] [<ffffffff8111f342>] do_sync_write+0xf2/0x130 [301080.930229] [<ffffffff811511db>] ? fsnotify+0xfb/0x140 [301080.930232] [<ffffffff81078a30>] ? autoremove_wake_function+0x0/0x40 [301080.930235] [<ffffffff81133333>] ? dput+0xc3/0x190 [301080.930239] [<ffffffff81224a11>] ? security_file_permission+0x11/0x20 [301080.930242] [<ffffffff8111f628>] vfs_write+0xb8/0x1a0 [301080.930245] [<ffffffff811200dc>] sys_write+0x4c/0x80 [301080.930249] [<ffffffff81012082>] system_call_fastpath+0x16/0x1b Let me know if I can be of any further assistance? If you require more info / test running, I'm happy to help. Neil See: http://marc.info/?l=linux-fsdevel&m=127166071530948&w=2 Sync is acting as designed right now. I agree it's not ideal, but it's now defaulting to slow-but-safe behaviour rather than the previous behaviour of potentially not syncing everything that was dirty at the time of the sync call. Cheers, Dave. Dave (above) said: "Sync is acting as designed right now. I agree it's not ideal, but it's now defaulting to slow-but-safe behaviour rather than the previous behaviour of potentially not syncing everything that was dirty at the time of the sync call. Dave," Are you aware that this blocks other IO so that a user who requests a read of some data may have to wait for something like 20 minutes before getting a response? This includes, for instance, just typing vi xxx. Take a look at the reports above which show nfsd being effectively blocked for periods of more than 20 minutes. For me this is not just "not ideal" but simply useless. I do not see how a system with this behavior can be used. I also do not see why sync completing with dirty data is a problem. In an active system there will be new dirty data within milliseconds of sync completion no matter what it does. I am well-aware that this is not a simple problem. But, a solution that is consistent with the usability of the system is necessary. Michael (In reply to comment #35) > Are you aware that this blocks other IO so that a user who requests > a read of some data may have to wait for something like 20 minutes > before getting a response? This includes, for instance, just typing > vi xxx. That's not a just a read - vi(m) writes a backup file when you open it. So it's blocking on writes. However, that sort of antisocial behaviour under heavy write loads is usually caused by a filesystem concurrency limitation or a IO scheduler problem, not sync. However, unless you can reproduce the read hangs on XFS when sync is running, then I'm not the expert you're looking for to debug them. ;) But I do know the endless sync problem is filesystem independent and I'm trying to do something about mitigating it's effects: http://lkml.org/lkml/2010/4/19/410 > Take a look at the reports above which show nfsd being effectively > blocked for periods of more than 20 minutes. Can't say I've heard of any such recent problems on XFS.... Cheers, Dave. "Can't say I've heard of any such recent problems on XFS...." Are these not effectivly the same issue?: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/276476 https://bugs.launchpad.net/ubuntu/+source/linux/+bug/494476 Neil Just a couple of comments on the posts above: 1. In our environment, home directories are NFS mounted, so vi cannot read .vimrc. It appears that this is where it hangs. The important thing is that nothing appears on the user's screen after typing vi. 2. We do not have an XFS filesystem, so I cannot comment on whether these problems exist for XFS. 3. The comments indicate that, as usual, there is more than one problem. I am hopeful that the ext4 problem that Eric Sandeen is fixing and proposed fixes for sync will make a significant difference. (In reply to comment #28) > Created an attachment (id=25883) [details] > Patch to stop background writeback when other work is queued for the thread > > Oops, attached a wrong patch. This is the right one. I ran into this issue recently (extra long sync times) and have been trying the patches attached in this thread. The above patch specifically causes a pretty significant performance regression for me doing a simple sequential dd write on a dual-core Atom system running 2.6.33.4 x86_64. This command: dd bs=1M conv=fsync if=/dev/zero of=test_file count=10000 went from 135 MB/sec to all the way down to 92.9 MB/sec. Is this expected? Created attachment 26456 [details]
Patch to stop background writeback when other work is queued for the thread
Ah, there's a bug in the condition in the original patch and thus it stops background writeback when there it *no* other work queued instead when there *is*. Could you please test this fixed version?
Ah, much better. :) Performance for me with this version is now just about the same as without the patch. |
Created attachment 24220 [details] descrition of system and excerps from /var/log/messages The sync task runs for a very long time and generates log messages: INFO: task sync:1996 blocked for more than 120 seconds. Under similar conditions the same messages appear in /var/log/messages, but referencing nfsd. During these periods performance drops drastically. See attached.