Distribution: Debian testing/unstable Software Environment: Kernel is actually from http://wiki.debian.org/DebianKernel. Version is 2.6.27-1~experimental.1~snapshot.12516, which appears to be based on 2.6.27.10. Problem Description: I mentioned this in my last bug report (http://bugzilla.kernel.org/show_bug.cgi?id=12424), but it seems that was something unrelated (I think). My firefox downloads have been hanging off and on since reformatting ext3->ext4. I've finally found a way to reproduce this 100% of the time. Steps: 1.) download large file to folder on ext4 fs 2.) close all firefox windows except the download manager before download completes 3.) download will hang at nearly 100% This happens every single time. If I keep another firefox window open (beside the downloads mini-window), the download always completes successfully. If I do the same steps except save the file to an ext3 fs, it always works.
In case there is some thread that is hung up, can you try sysrq-w (echo w > /proc/sysrq-trigger) and see what you get in dmesg? (you can attach that here) -Eric
Created attachment 20067 [details] output of sysrq-w
ok, nothing interesting there ...
How big is a "large file"? Can you give a sample URL? Also, can you try collecting the output of: ps -wweo uid,pid,ppid,pri,wchan:20,stat,time,command If any firefox or kjournal processes are running, perhaps sysrq-l will give us something useful.
So, I don't think it actually needs to be a very large file, just large enough so that I can close the main firefox window before it completely downloads. I've actually been downloading firefox itself as a testcase because it take a few seconds to finish. ps output attached sysrq-l didn't have anything just: "[1893205.442069] SysRq : Show backtrace of all active CPUs" I should point out that firefox isn't "hung" in the traditional sense. It's not using 100% cpu. I can cancel the download and exit just fine. But the download itself it hung and will never finish. The only reason I'm reporting this to ext4 instead of firefox is that it's 100% reproducible w/ext4 and works 100% of the time w/ext3.
Created attachment 20070 [details] ps output
One more thing: After I have a download in the "hung" state, if I close the download window, start firefox again, and then open the download window, I get the following message from firefox: --- /home/avery/downloads/firefox-3.0.5.tar.bz2.part could not be saved, because the source file could not be read. Try again later, or contact the server administrator. --- avery@polar:~/downloads$ ls -l firefox-3.0.5.tar.bz2* -rw------- 1 avery avery 0 2009-02-01 21:39 firefox-3.0.5.tar.bz2 -rw------- 1 avery avery 9112341 2009-02-01 21:54 firefox-3.0.5.tar.bz2.part
Thanks, I was about to ask about file sizes, that's part of it. How big should this file be when it does complete properly? Thanks, -Eric
The only thing I can think of doing at this point would be to strace firefox under ext3 and ext4 and see if we can see a difference in terms of what happens --- and what firefox is doing when it is writing the file and where it is hanging.
when download is hung: avery@polar:~/downloads$ ls -l firefox* -rw------- 1 avery avery 0 2009-02-01 22:32 firefox-3.0.5.tar.bz2 -rw------- 1 avery avery 9111221 2009-02-01 22:32 firefox-3.0.5.tar.bz2.part avery@polar:~/downloads$ md5sum firefox-3.0.5.tar.bz2.part 6638ab249ae75d8fd345b34c187e87d4 firefox-3.0.5.tar.bz2.part after closing download window: avery@polar:~/downloads$ ls -l firefox* -rw------- 1 avery avery 0 2009-02-01 22:32 firefox-3.0.5.tar.bz2 -rw------- 1 avery avery 9112341 2009-02-01 22:33 firefox-3.0.5.tar.bz2.part avery@polar:~/downloads$ md5sum firefox-3.0.5.tar.bz2.part 9ee0b64ab41bb30c0be00ceb972f111c firefox-3.0.5.tar.bz2.part a good download: avery@polar:~/downloads$ ls -l firefox-3.0.5.tar.bz2 -rw-r--r-- 1 avery avery 9112341 2009-02-01 22:36 firefox-3.0.5.tar.bz2 avery@polar:~/downloads$ md5sum firefox-3.0.5.tar.bz2 9ee0b64ab41bb30c0be00ceb972f111c firefox-3.0.5.tar.bz2 so, it appears it actually completes the download. I'm not sure why it's not completely written out before I close the window.
About strace: The debian strace maintainer is MIA and I can't strace anything for more than a few seconds due to: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=511083 If you think it would be really useful, I can try compiling the latest cvs of strace to see if the bug is fixed.
I ended up getting strace from cvs which fixes my bug. 2 strace's attached. bad: starts right before i click the download link and ends at most a second or 2 after it's hung. good: starts right before i click the download link and ends a few seconds after download completes.
Created attachment 20072 [details] bad strace
Created attachment 20073 [details] good strace
Oh, you can the file descriptors for the files related to download by grepping for 'tar.bz2'.
Unfortunately the strace logs aren't complete because firefox multi-threaded, and it looks like strace is only tracing one thread. So we can see that the thread which writes the downloaded file does a poll(2) for a set of file descriptors, including fd 18, and then it reads a byte from fd 18, and then writes a buffer to fd 54, which is the firefox-3.0.5.tar.bz2.part file. But it is always writing in 32k chunks, and it's not writing the last 2837 bytes, as in the good strace. It looks like the thread which reads from the network isn't signalling that the last set of bytes isn't there, but why, I have no idea. It also seems very strange that this is filesystem-specific; whatever it is, there isn't anything in the file writing thread that would hint at this. I also can't duplicate it on my end. I wonder if it's something stupid like the writes are returning much faster, and this is triggering a race condition in firefox. Maybe some other thread is checking to see when the write is completing by stat'ing the fd, or something stupid like that. Something that might be worth trying is to chattr +S your downloads directory, which will force a sync after every write, and see if that makes a difference when you download the file from scratch. You'll want to do a "chattr -S downloads downloads/*" afterwards, since a sync after every writes does a real number on performance. But if that causes firefox to succeed, then it's probably some wierd timing/race condition problem in firefox. I'll note that I can't reproduce this on my firefox on my Ubuntu/Hardy system. BTW, how many CPU's do you have, and which version of Firefox are you running?
Any luck reproducing this problem? Especially on a more recent kernel version? If I don't get a response, I plan to close this bug, since we've fixed a lot of problem in the last couple of months....
Sorry, this slipped my mind. It was almost certainly a race condition in firefox. I ended up cleaning up a bunch of old stuff in my home directory (specifically in the folder that I was downloading to) and it just stopped happening altogether.