Bug 83321 - file corruption with reiserfs partitions under 3.16.0-1, inclusive.
Summary: file corruption with reiserfs partitions under 3.16.0-1, inclusive.
Status: NEW
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: io_other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-08-27 08:12 UTC by Bob Raitz
Modified: 2016-03-21 18:16 UTC (History)
6 users (show)

See Also:
Kernel Version: 3.16.0-1 inclusive
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Bob Raitz 2014-08-27 08:12:36 UTC
Under 3.16.0, Gentoo/Funtoo's emerge package installation system, portage takes on severe corruption (truncation) with each attempt to sync portage to the global tree. 

On a different system, also running 3.16.0, the file corruption was not limited to portage and python (portage is written in python). There was also extensive corruption noted under /etc.

The giveaway was that one of my systems, which is not currently running 3.16.x, has had none of these many issues. They have only been occurring with the systems running 3.16.x. One is an older Gateway lappy, and the other, a core-two running 64 bit. The 64 bit machine seemed much more sensitive than the lappy.

Here is the Gentoo discussion thread:

http://forums.gentoo.org/viewtopic-p-7607392.html
Comment 1 Matthew 2014-08-28 15:07:12 UTC
most probably dupe of https://bugzilla.kernel.org/show_bug.cgi?id=83121
Comment 2 Alexander Bezrukov 2014-09-04 13:44:29 UTC
I can confirm that I also faced this bug with 3.16.1. With 3.15.7 I see no manifestation of this issue. This seem to happen to reiserfs filesystems with lots of small frequently changing files (portage tree is a good example).

My reiserfs partition is on a hardware RAID array where the controller is set up to dishonor FUA requests (with RAM journalling and backup enabled, this is probably most typical setup on many servers), this may be a factor. I see no manifestation on my laptop running gentoo where the same partition for portage tree is on a standard SATA HDD.
Comment 3 Alexander Bezrukov 2014-09-04 20:34:39 UTC
(In reply to Alexander Bezrukov from comment #2)

Please disregard my last comment. I reproduced this issue on a filesystem on a standard SATA drive, too. For some reason it didn't manifest at the beginning but now is easily reproducible.
Comment 4 Alexander Bezrukov 2014-09-20 12:32:45 UTC
This is probably a duplication of bug 83121.

I can confirm that with 3.16.3  the issue went out, I cannot reproduce it anymore.

From the changelog:
commit 9ae91b17b20eafecf8dc4416f86383c76dcdc6a4
Author: Jan Kara <jack@suse.cz>
Date:   Wed Aug 6 19:43:56 2014 +0200

    reiserfs: Fix use after free in journal teardown
    
    commit 01777836c87081e4f68c4a43c9abe6114805f91e upstream.
    
    If do_journal_release() races with do_journal_end() which requeues
    delayed works for transaction flushing, we can leave work items for
    flushing outstanding transactions queued while freeing them. That
    results in use after free and possible crash in run_timers_softirq().
    
    Fix the problem by not requeueing works if superblock is being shut down
    (MS_ACTIVE not set) and using cancel_delayed_work_sync() in
    do_journal_release().
    
    Signed-off-by: Jan Kara <jack@suse.cz>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 906f27708b9126cd6793e35094d283f5f259ec0b
Author: Jeff Mahoney <jeffm@suse.com>
Date:   Mon Aug 4 19:51:47 2014 -0400

    reiserfs: fix corruption introduced by balance_leaf refactor
    
    commit 27d0e5bc85f3341b9ba66f0c23627cf9d7538c9d upstream.
    
    Commits f1f007c308e (reiserfs: balance_leaf refactor, pull out
    balance_leaf_insert_left) and cf22df182bf (reiserfs: balance_leaf
    refactor, pull out balance_leaf_paste_left) missed that the `body'
    pointer was getting repositioned. Subsequent users of the pointer
    would expect it to be repositioned, and as a result, parts of the
    tree would get overwritten. The most common observed corruption
    is indirect block pointers being overwritten.
    
    Since the body value isn't actually used anymore in the called routines,
    we can pass back the offset it should be shifted. We constify the body
    and ih pointers in the balance_leaf as a mostly-free preventative measure.
    
    Reported-and-tested-by: Jeff Chua <jeff.chua.linux@gmail.com>
    Signed-off-by: Jeff Mahoney <jeffm@suse.com>
    Signed-off-by: Jan Kara <jack@suse.cz>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Note You need to log in before you can comment on or make changes to this bug.