Bug 33112 - Possible file corruption (RAID10+LVM+ext4) with chromium and kernel build
Summary: Possible file corruption (RAID10+LVM+ext4) with chromium and kernel build
Status: RESOLVED DUPLICATE of bug 32972
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: io_other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-04-12 07:56 UTC by Giacomo Catenazzi
Modified: 2011-04-13 19:03 UTC (History)
3 users (show)

See Also:
Kernel Version: 2.6.39-rc2
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments

Description Giacomo Catenazzi 2011-04-12 07:56:06 UTC
[Set to "Component: Other", because the long block chain]

Just after solving bug #32062 I did a system update (debian unstable, new gcc and binutils + gold [but I don't think gold is enabled on default]).

Now at every kernel update I have a linker error, usually on some **/built-in.o (something like "invalid character in 1,1"), or in some *.lds or *.dbg files (invalid/empty file).

This is solvable with a make mrproper and a complete build.

I was thinking about some Debian chaintool problem, but earlier today I did further check and I found that the **/built-in.o files seem to contain the chromium history (debian chromium), which I think it is saved frequently.

BTW I use the browser a lot when building the kernels, and I often shut down the system just after the kernel build.

So I think there is a problem on flushing the buffers to the disk.

You can see my setup (and some dmesg) in https://bugzilla.kernel.org/show_bug.cgi?id=32062 and in https://bugzilla.kernel.org/show_bug.cgi?id=24012 , but in short: 4 disks, with a RAID10. I build a LVM on them, and then some ext4 fs, and usually I have a lot of free memory.

In next days I'll try to bisect the bug, but it will be slow, especially if the bug is not 100% reproducible. BTW: do you have some tools to detect earlier possible disk corruptions?

ciao
    cate
Comment 1 Giacomo Catenazzi 2011-04-13 07:12:13 UTC
I did not yet finish bisecting the kernel (I'll continue later). But now I've restricted to few commits (probably 3 ext4 commits or 2 block layer commits). Here the log:

git bisect log 
# bad: [6aba74f2791287ec407e0f92487a725a25908067] Merge branch 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
# good: [521cb40b0c44418a4fd36dc633f575813d59a43d] Linux 2.6.38
git bisect start '6aba74f' 'v2.6.38'
# good: [61ef46fd45c3c62dc7c880a45dd2aa841b9af8fb] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
git bisect good 61ef46fd45c3c62dc7c880a45dd2aa841b9af8fb
# good: [6447f55da90b77faec1697d499ed7986bb4f6de6] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx
git bisect good 6447f55da90b77faec1697d499ed7986bb4f6de6
# good: [69d1fe18e92afb4687605a1ab2ec73fbc3bae344] mmc: tmio: only access registers above 0xff, if available
git bisect good 69d1fe18e92afb4687605a1ab2ec73fbc3bae344
# bad: [17c6dd8144924e3c71930636091704da6d043536] Merge branch 'omap-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6
git bisect bad 17c6dd8144924e3c71930636091704da6d043536
# bad: [00a2470546dd8427325636a711a42c934135dbf5] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
git bisect bad 00a2470546dd8427325636a711a42c934135dbf5
# bad: [94df491c4a01b39d81279a68386158eb02656712] Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
git bisect bad 94df491c4a01b39d81279a68386158eb02656712
# bad: [f9b08d9c606498584e1fb05ab95a575e52f0f8e2] MIPS: Remove useless initialization.
git bisect bad f9b08d9c606498584e1fb05ab95a575e52f0f8e2
# good: [0562e0bad483d10e9651fbb8f21dc3d0bad57374] ext4: add more tracepoints and use dev_t in the trace buffer
git bisect good 0562e0bad483d10e9651fbb8f21dc3d0bad57374
# good: [cccb4d063b263ac0713ab27d98460fda3b4f83ff] NFSv4.1 remove temp code that prevented ds commits
git bisect good cccb4d063b263ac0713ab27d98460fda3b4f83ff
Comment 2 Giacomo Catenazzi 2011-04-13 19:03:03 UTC
Finished the bisect, and damn ... it is just reverted. So the commit was

6de9843dab3f2a1d4d66d80aa9e5782f80977d20 is the first bad commit
commit 6de9843dab3f2a1d4d66d80aa9e5782f80977d20
Author: Feng Tang <feng.tang@intel.com>
Date:   Wed Mar 23 14:05:03 2011 -0400

    ext4: remove redundant set_buffer_mapped() in ext4_da_get_block_prep()
    
    The map_bh() call will have already set the buffer_head to mapped.
    
    Signed-off-by: Feng Tang <feng.tang@intel.com>
    Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>


but now reverted with
commit c8205636029fc869278c55b7336053b3e7ae3ef4
Author: Theodore Ts'o <tytso@mit.edu>
Date:   Sun Apr 10 22:30:07 2011 -0400

    ext4: fix data corruption regression by reverting commit 6de9843dab3f
    
    Revert commit 6de9843dab3f2a1d4d66d80aa9e5782f80977d20, since it
    caused a data corruption regression with BitTorrent downloads.  Thanks
    to Damien for discovering and bisecting to find the problem commit.
    
    https://bugzilla.kernel.org/show_bug.cgi?id=32972
    
    Reported-by: Damien Grassart <damien@grassart.com>
    Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Comment 3 Giacomo Catenazzi 2011-04-13 19:03:34 UTC

*** This bug has been marked as a duplicate of bug 32972 ***

Note You need to log in before you can comment on or make changes to this bug.