Bug 14491

Summary: Large I/O operations cause constant seeking on a drive unrelated to the I/O operation
Product: IO/Storage Reporter: rocko (rockorequin)
Component: Block LayerAssignee: Jens Axboe (axboe)
Status: RESOLVED CODE_FIX    
Severity: normal CC: brian, come.desplats, daniel, nalimilan, pauphelle, thomas.pi
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.31 Subsystem:
Regression: No Bisected commit-id:
Attachments: two scheduling-while-atomic traces

Description rocko 2009-10-26 23:46:08 UTC
When I copy large amounts of data from one external device (say /dev/sdb) to another slow USB flash key (say /dev/sdc), I can hear my *internal* hard drive (/dev/sda) thrashing away constantly even though its light indicates that no read/write activity is going on. I imagine that the internal hard drive must be unrelated to the copy operation other than for buffering, so there shouldn't be any reason for this.

During this time anything that requires access to /dev/sda is slowed right down and hence running new programs slows down disk access.

When I start copying, eg using nautilus, there is usually a 400 MB buffering delay before writing starts to the USB drive (ie before its light starts flashing). During this time, there is NO /dev/sda thrashing. /dev/sda starts thrashing starts as soon as the USB key light starts flashing.

So there appears to be a bug that makes /dev/sda constantly seek during the /dev/sdc USB write operation.

I think that this bug has been around in previous kernels, too, eg at least from 2.6.24 and it might be related to bug #12309. I'm opening a new bug since that one has been marked 'resolved - insufficient data'.

This bug would normally go unnoticed other than for the drop in desktop responsiveness, since (a) it wouldn't be apparent on quiet drives, and (b) it wouldn't be so apparent when copying to/from the internal hard drive, since its read/write light would be constantly on.
Comment 1 Jens Axboe 2009-10-27 08:14:48 UTC
Please try 2.6.32-rc5. Make sure you are using CFQ as your io scheduler.
Comment 2 rocko 2009-10-28 02:28:26 UTC
I tried 2.6.32-rc5 and couldn't make it happen. (I'm using cfq in both kernels.)

I think the problem might be to do with the way buffering is being handled. After a reboot, 2.6.31 _doesn't_ exhibit the problem, but once I load up a bunch of applications so that my 4GB RAM is near full, 2.6.31 will start the disk thrashing.

The behaviour of each kernel during copying two 800MB files differs like so:

2.6.31: nautilus' copy bar zooms away, showing that 400 MB, 600 MB, 800 MB, 1GB, etc has been copied. When the RAM is near full, disk thrashing occurs.

2.6.32: nautilus' copy bar slows down either (a) at the end of each file and (b) (presumably) when free RAM is exhausted. After a reboot, nautilus's copy bar goes up to 800 MB and then pauses until the first file is written before continuing. If I haven't got much RAM available, the copy bar goes up to (say) 300MB and then slows down while the write operation catches up.
Comment 3 rocko 2009-10-28 02:43:01 UTC
Just an observation on 2.6.32-rc5: I am getting the occasional 'scheduling while atomic: swapper/0/0x10000100' bug (X freezes for a while when this happens). Is that related to the cfq scheduler or another area?
Comment 4 Jens Axboe 2009-10-28 08:00:20 UTC
Please post the backtrace from that error message.
Comment 5 rocko 2009-10-28 10:19:34 UTC
Created attachment 23557 [details]
two scheduling-while-atomic traces

Here's a trace from kern.log of two separate incidents.
Comment 6 Jens Axboe 2009-10-28 10:21:13 UTC
Does it trigger without the nvidia module?
Comment 7 rocko 2009-10-28 11:43:03 UTC
I'm having trouble triggering the oops now. It happened the first two times I booted and not since (either with or without the nvidia problem). I thought it might have been when I told a VirtualBox VM to save its state to disk, but I've done many cycles of it now without it re-occurring.
Comment 8 rocko 2009-12-09 02:10:56 UTC
An update on the original bug: I believe this has been fixed in the 2.6.32 kernel as I haven't been able to reproduce it at all in the five weeks or so that I've been using it. I also haven't seen the oops mentioned above since 2.6.32-rc5.
Comment 9 Jens Axboe 2009-12-09 11:24:07 UTC
Thanks, 2,6,32 (and later .31-rc's) should be much better in this regard. Closing.