Bug 12603 - firefox downloads hang on ext4; fine on ext3
Summary: firefox downloads hang on ext4; fine on ext3
Status: RESOLVED INVALID
Alias: None
Product: File System
Classification: Unclassified
Component: ext4 (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: fs_ext4@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-02-01 14:32 UTC by Avery Fay
Modified: 2009-05-20 20:20 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.27.10
Subsystem:
Regression: No
Bisected commit-id:


Attachments
output of sysrq-w (5.19 KB, text/plain)
2009-02-01 16:05 UTC, Avery Fay
Details
ps output (12.42 KB, text/plain)
2009-02-01 18:52 UTC, Avery Fay
Details
bad strace (314.74 KB, application/x-bzip)
2009-02-01 20:42 UTC, Avery Fay
Details
good strace (443.65 KB, application/x-bzip)
2009-02-01 20:43 UTC, Avery Fay
Details

Description Avery Fay 2009-02-01 14:32:20 UTC
Distribution:

Debian testing/unstable

Software Environment:

Kernel is actually from http://wiki.debian.org/DebianKernel. Version is
2.6.27-1~experimental.1~snapshot.12516, which appears to be based on 2.6.27.10.

Problem Description:

I mentioned this in my last bug report (http://bugzilla.kernel.org/show_bug.cgi?id=12424), but it seems that was something unrelated (I think). My firefox downloads have been hanging off and on since reformatting ext3->ext4. I've finally found a way to reproduce this 100% of the time.

Steps:
1.) download large file to folder on ext4 fs
2.) close all firefox windows except the download manager before download completes
3.) download will hang at nearly 100%

This happens every single time. If I keep another firefox window open (beside the downloads mini-window), the download always completes successfully. If I do the same steps except save the file to an ext3 fs, it always works.
Comment 1 Eric Sandeen 2009-02-01 15:35:02 UTC
In case there is some thread that is hung up, can you try sysrq-w (echo w > /proc/sysrq-trigger) and see what you get in dmesg?  (you can attach that here)

-Eric
Comment 2 Avery Fay 2009-02-01 16:05:27 UTC
Created attachment 20067 [details]
output of sysrq-w
Comment 3 Eric Sandeen 2009-02-01 16:11:10 UTC
ok, nothing interesting there ...
Comment 4 Theodore Tso 2009-02-01 18:37:03 UTC
How big is a "large file"?   Can you give a sample URL?

Also, can you try collecting the output of:

ps -wweo uid,pid,ppid,pri,wchan:20,stat,time,command

If any firefox or kjournal processes are running, perhaps sysrq-l will give us something useful.
Comment 5 Avery Fay 2009-02-01 18:52:25 UTC
So, I don't think it actually needs to be a very large file, just large enough so that I can close the main firefox window before it completely downloads. I've actually been downloading firefox itself as a testcase because it take a few seconds to finish.

ps output attached

sysrq-l didn't have anything just:

"[1893205.442069] SysRq : Show backtrace of all active CPUs"

I should point out that firefox isn't "hung" in the traditional sense. It's not using 100% cpu. I can cancel the download and exit just fine. But the download itself it hung and will never finish. The only reason I'm reporting this to ext4 instead of firefox is that it's 100% reproducible w/ext4 and works 100% of the time w/ext3.
Comment 6 Avery Fay 2009-02-01 18:52:48 UTC
Created attachment 20070 [details]
ps output
Comment 7 Avery Fay 2009-02-01 18:57:13 UTC
One more thing:

After I have a download in the "hung" state, if I close the download window, start firefox again, and then open the download window, I get the following message from firefox:

---
/home/avery/downloads/firefox-3.0.5.tar.bz2.part could not be saved, because the source file could not be read.

Try again later, or contact the server administrator.
---

avery@polar:~/downloads$ ls -l firefox-3.0.5.tar.bz2*
-rw------- 1 avery avery       0 2009-02-01 21:39 firefox-3.0.5.tar.bz2
-rw------- 1 avery avery 9112341 2009-02-01 21:54 firefox-3.0.5.tar.bz2.part
Comment 8 Eric Sandeen 2009-02-01 19:29:01 UTC
Thanks, I was about to ask about file sizes, that's part of it.  How big should this file be when it does complete properly?

Thanks,
-Eric
Comment 9 Theodore Tso 2009-02-01 19:32:12 UTC
The only thing I can think of doing at this point would be to strace firefox under ext3 and ext4 and see if we can see a difference in terms of what happens --- and what firefox is doing when it is writing the file and where it is hanging.  
Comment 10 Avery Fay 2009-02-01 19:39:21 UTC
when download is hung:

avery@polar:~/downloads$ ls -l firefox*
-rw------- 1 avery avery       0 2009-02-01 22:32 firefox-3.0.5.tar.bz2
-rw------- 1 avery avery 9111221 2009-02-01 22:32 firefox-3.0.5.tar.bz2.part
avery@polar:~/downloads$ md5sum firefox-3.0.5.tar.bz2.part 
6638ab249ae75d8fd345b34c187e87d4  firefox-3.0.5.tar.bz2.part

after closing download window:

avery@polar:~/downloads$ ls -l firefox*
-rw------- 1 avery avery       0 2009-02-01 22:32 firefox-3.0.5.tar.bz2
-rw------- 1 avery avery 9112341 2009-02-01 22:33 firefox-3.0.5.tar.bz2.part
avery@polar:~/downloads$ md5sum firefox-3.0.5.tar.bz2.part 
9ee0b64ab41bb30c0be00ceb972f111c  firefox-3.0.5.tar.bz2.part

a good download:

avery@polar:~/downloads$ ls -l firefox-3.0.5.tar.bz2
-rw-r--r-- 1 avery avery 9112341 2009-02-01 22:36 firefox-3.0.5.tar.bz2
avery@polar:~/downloads$ md5sum firefox-3.0.5.tar.bz2
9ee0b64ab41bb30c0be00ceb972f111c  firefox-3.0.5.tar.bz2

so, it appears it actually completes the download. I'm not sure why it's not completely written out before I close the window. 
Comment 11 Avery Fay 2009-02-01 19:46:19 UTC
About strace:

The debian strace maintainer is MIA and I can't strace anything for more than a few seconds due to:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=511083

If you think it would be really useful, I can try compiling the latest cvs of strace to see if the bug is fixed.
Comment 12 Avery Fay 2009-02-01 20:42:24 UTC
I ended up getting strace from cvs which fixes my bug. 2 strace's attached.

bad: starts right before i click the download link and ends at most a second or 2 after it's hung.
good: starts right before i click the download link and ends a few seconds after download completes.
Comment 13 Avery Fay 2009-02-01 20:42:58 UTC
Created attachment 20072 [details]
bad strace
Comment 14 Avery Fay 2009-02-01 20:43:34 UTC
Created attachment 20073 [details]
good strace
Comment 15 Avery Fay 2009-02-01 20:44:55 UTC
Oh, you can the file descriptors for the files related to download by grepping for 'tar.bz2'.
Comment 16 Theodore Tso 2009-02-02 04:33:20 UTC
Unfortunately the strace logs aren't complete because firefox multi-threaded, and it looks like strace is only tracing one thread.   So we can see that the thread which writes the downloaded file does a poll(2) for a set of file descriptors, including fd 18, and then it reads a byte from fd 18, and then writes a buffer to fd 54, which is the firefox-3.0.5.tar.bz2.part file.   But it is always writing in 32k chunks, and it's not writing the last 2837 bytes, as in the good strace.

It looks like the thread which reads from the network isn't signalling that the last set of bytes isn't there, but why, I have no idea.

It also seems very strange that this is filesystem-specific; whatever it is, there isn't anything in the file writing thread that would hint at this.   I also can't duplicate it on my end.   I wonder if it's something stupid like the writes are returning much faster, and this is triggering a race condition in firefox.  Maybe some other thread is checking to see when the write is completing by stat'ing the fd, or something stupid like that.

Something that might be worth trying is to chattr +S your downloads directory, which will force a sync after every write, and see if that makes a difference when you download the file from scratch.   You'll want to do a "chattr -S downloads downloads/*" afterwards, since a sync after every writes does a real number on performance.   But if that causes firefox to succeed, then it's probably some wierd timing/race condition problem in firefox.

I'll note that I can't reproduce this on my firefox on my Ubuntu/Hardy system.

BTW, how many CPU's do you have, and which version of Firefox are you running?
Comment 17 Theodore Tso 2009-05-19 18:40:47 UTC
Any luck reproducing this problem?   Especially on a more recent kernel version?

If I don't get a response, I plan to close this bug, since we've fixed a lot of problem in the last couple of months....
Comment 18 Avery Fay 2009-05-20 20:20:44 UTC
Sorry, this slipped my mind. It was almost certainly a race condition in firefox. I ended up cleaning up a bunch of old stuff in my home directory (specifically in the folder that I was downloading to) and it just stopped happening altogether.

Note You need to log in before you can comment on or make changes to this bug.