Bug 12424

Summary: ext4_da_writepages error while downloading a file w/firefox
Product: File System Reporter: Avery Fay (avery)
Component: ext4Assignee: Eric Sandeen (sandeen)
Status: RESOLVED CODE_FIX    
Severity: normal CC: sandeen, tytso
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.27.10 Subsystem:
Regression: No Bisected commit-id:
Attachments: backtrace from console
fsck part 1
fsck part 2

Description Avery Fay 2009-01-10 19:52:56 UTC
Distribution:

Debian testing/unstable

Software Environment:

Kernel is actually from http://wiki.debian.org/DebianKernel. Version is 2.6.27-1~experimental.1~snapshot.12516, which appears to be based on 2.6.27.10.

Problem Description:

I was downloading a file in firefox when my machine starting beeping and I repeatedly had a stacktrace printed to a terminal (see photo).

Steps to reproduce:

This may or may not be related, but ever since I backed up and restored to a new ext4 filesystem, I've had a lot of trouble downloading stuff with firefox. Symptoms are I'll get to 100% completed download and then it hangs. The directory contains the normal .part but firefox never moves it to the proper name (i.e. whatever.part -> whatever). Most stuff still downloads ok, but I'd estimate since I've gone ext3->ext4 this has happened ~5 times. Also, I'm pretty sure when this happens the download isn't 100% correct. The one time I checked the md5sum did not match.
Comment 1 Avery Fay 2009-01-10 19:53:39 UTC
Created attachment 19740 [details]
backtrace from console
Comment 2 Eric Sandeen 2009-01-10 20:02:20 UTC
This is journal_start failing:

                /* start a new transaction*/
                handle = ext4_journal_start(inode, needed_blocks);
                if (IS_ERR(handle)) {
                        ret = PTR_ERR(handle);
                        printk(KERN_EMERG "%s: jbd2_start: "
                               "%ld pages, ino %lu; err %d\n", __func__,
                                wbc->nr_to_write, inode->i_ino, ret);
                        dump_stack();
                        goto out_writepages;
                }

with error -30, which is EROFS.

Were there other messages before this, maybe an errors=remount-readonly sort of filesystem problem?  Did the filesystem go readonly?
Comment 3 Avery Fay 2009-01-10 20:08:23 UTC
Yes, the filesystem went read only. Unfortunately, I wasn't in the room when it happened and by the time I came in, dmesg was just filled with this backtrace. If it helps, I snapped pictures of the fsck on reboot too. I'll attach those.
Comment 4 Avery Fay 2009-01-10 20:08:50 UTC
Created attachment 19741 [details]
fsck part 1
Comment 5 Avery Fay 2009-01-10 20:09:11 UTC
Created attachment 19742 [details]
fsck part 2
Comment 6 Theodore Tso 2009-01-12 07:07:35 UTC
There was at least one ext4 bug that could have caused this that was fixed in 2.6.28 as well as in the 2.6.29 merge window.   Unfortunately, we need to see the original bug to know which bug might as caused the problem.   There was also a more recent bug fix that makes ext4_da_writepages much less verbose, since that tends to hide the original root cause of the problem....
Comment 7 Theodore Tso 2009-05-19 18:41:23 UTC
Any luck reproducing this problem, especially on a more recent kernel version?
Comment 8 Avery Fay 2009-05-20 20:22:27 UTC
This has not happened again both on 2.6.27 and on 2.6.29 (what I'm running now).
Comment 9 Eric Sandeen 2009-08-18 17:36:31 UTC
Assigning to myself so I can close.
Comment 10 Eric Sandeen 2009-08-18 17:37:25 UTC
It sounds like this is fixed, though unfortunately we don't have a nice commit to point to.  If this shows up again, please do re-open.

Thanks,
-Eric