Bug 6380

Summary: XFS corruption with Linux 2.6.16
Product: File System Reporter: Martin Steigerwald (Martin)
Component: XFSAssignee: XFS Guru (xfs-masters)
Status: CLOSED CODE_FIX    
Severity: normal    
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.16.4 Subsystem:
Regression: --- Bisected commit-id:
Attachments: output of xfs_check for occasion 3 from the bug report
output of xfs_repair for occasion 3 from the bug report
kernel config of 2.6.16.4 under which I had occasion 2 and 3 from the bug report
kernel config of the lastest 2.6.15 kernel I had in use
output of lscpi-nvv for my IBM ThinkPad T23

Description Martin Steigerwald 2006-04-12 07:26:27 UTC
Most recent kernel where this bug did not occur: no later kernel version tested
Distribution: Debian Linux Etch/Sid
Hardware Environment: IBM ThinkPad T23, P3 with 1.13 GHz, 384 MB RAM
Software Environment: Kernel 2.6.16.1, kernel 2.6.16.4 with sws2 2.2.4
Problem Description:

I get severe XFS corruption on random occasions. It happens with my Debian 
Linux root partition. /home on a different XFS partition was not yet affected 
(lucky me). And OpenSUSE 10 on yet another partition was not affected to. I 
used OpenSUSE only to recover my broken Debian partition tough.

I have no reproducible pattern do that and it will be triggered, it just 
happens. I got XFS corruption three times within 1 week:

1) I don't know when it happened, but I noticed it as dpkg complained about 
several errors in /var/lib/dpkg/available. I first suspected that it was 
corrupted due to some bug in Debian package management, but then found out that 
it just contained lots of garbage characters at the end of the file. In the 
middle of the file some text where missing or duplicated. 

I booted to OpenSUSE 10 and xfs_check reported errors beyond that usual stuff 
about old deleted files (agi unlinked node or something like that) that can 
easily be fixed. It has been just a few errors and I restored the "available" 
via "apt-cache dumpavail".

2) Next time I wanted to start a mindmap in kdissert, a mind mapping tool for 
KDE, which I used with KDE 3.5.2. I just clicked around a bit, added an item or 
two and then the machine become unresponsive and finally the X.org (modular 
X.org 7 from Debian experimental) died. Then the machine seemed to be locked up 
completely. I switched it off finally. 

The machine didnt boot again, but GRUB found its menu.lst and I managed to boot 
into OpenSUSE 10. OpenSUSE (Kernel 2.6.13.5 or something like that) didn't 
manage to mount my root partition: error 990. I do not remember what happened 
with xfs_check... I think it reported tons of errors or I started with 
xfs_repair straight away. I had to use xfs_repair -L to force log zeroing. It 
reported tons of stuff. Unfortunately I did not log it to a file.

Debian linux booted again upto KDE 3.5 nicely. I tried to repair it, finally 
giving up due to about 200 MB of stuff in lost+found. I restored a backup from 
my externel USB harddisk via rsync.

This was yesterday. I updated my system from 2.6.15.6 to 2.6.16.4, before I had 
2.6.16.1 in use and the third crash happened.

martin@deepdance:~ -> dpkg -l | grep kdissert
ii  kdissert                1.0.5.debian-3          mindmapping tool
(I doubt its related to kdissert)

3) Today XFS got corrupted again. I had extensive apt-get updating running to 
make up for the 3 weeks since the last backup I restored and it also installed 
a new koffice version (release 1.5). I wanted to try out kword, it crashed 
straight away. I tried from console: bash told me "error while starting the 
executable". I did apt-get --reinstall install kword - then it worked.

Ok, once again OpenSUSE 10 and xfs_check. Errors again. Quite a few. This time 
I made a log file. Then xfs_repair, also with log file. I attach those two to 
this bug report.

One thing that I found was that at least with 2) and 3) I had an empty 
file /core in that corrupted XFS filesystem. I thought about the possibility of 
a kernel crash that overwrote XFS in-memory datastructures, but I learned, that 
the Linux kernel itself usually does not core dump to the filesystem.

On occasion 3 I made sure as I compiled 2.6.16.4 that I disable core dumping 
for ELF files. I still got that empty /core in the corrupted Debian root 
filesystem.


Steps to reproduce:

I am not really interested to reproduce this ;-). Well I have no idea. Probably 
use similar kernel, similar hardware and try to use that system productively 
for a while.

I have not had any XFS corruption during my usage of the various 2.6.15 kernels 
I had in use.

This bug report is probably related to:
  #6180


I will revert to 2.6.15.6 for now or even compile 2.6.15.7 as I can not afford 
the time to restore my Debian system from scratch once again. I will however 
restore it from the backup once again to make absolutely sure that it is 
consistent.

I know I probably won't be of much help debugging this, but I just don't have 
the resources to do fs debugging with my laptop that is in heavy productive use 
and at least at home I have no spare system either.

I may try again with 2.6.17 as soon as I am convinced that its stable enough.
Comment 1 Martin Steigerwald 2006-04-12 07:29:02 UTC
Created attachment 7845 [details]
output of xfs_check for occasion 3 from the bug report
Comment 2 Martin Steigerwald 2006-04-12 07:30:24 UTC
Created attachment 7846 [details]
output of xfs_repair for occasion 3 from the bug report
Comment 3 Martin Steigerwald 2006-04-12 07:33:19 UTC
I suspected defective RAM on my IBM ThinkPad, so I ran memtest86 for an hour. 
It reported to errors. I have 128 MB that were originally in that ThinkPad + 
256 Kingston RAM that is made with timings especially for IBM ThinkPad T23.

I will now be running 2.6.15 again and report whether I get corruption again. I 
think I won't.

The other bugreport I think this is related to with complete link is:
http://bugzilla.kernel.org/show_bug.cgi?id=6180

Does bugzilla generate a link for this? bug #6180
Comment 4 Martin Steigerwald 2006-04-12 07:57:30 UTC
I know this is not a plain vanilla kernel, cause I use sws2, but I used 2.6.15 
with sws2 since months already without XFS corruption three times a week. I 
would love to test it without sws2, but as I mentioned I have no time to afford 
to restore my backup again and again.
Comment 5 Martin Steigerwald 2006-04-12 07:59:38 UTC
Created attachment 7847 [details]
kernel config of 2.6.16.4 under which I had occasion 2 and 3 from the bug report
Comment 6 Martin Steigerwald 2006-04-12 08:01:13 UTC
Created attachment 7848 [details]
kernel config of the lastest 2.6.15 kernel I had in use

I did not have a XFS corruption problem with that kernel, nor with earlier
2.6.15 kernels
Comment 7 Martin Steigerwald 2006-04-12 08:02:20 UTC
Created attachment 7849 [details]
output of lscpi-nvv for my IBM ThinkPad T23
Comment 8 Nathan Scott 2006-04-12 23:12:19 UTC
Classic symptoms of the write cache being enabled on your drive.
Switch it off, or try a recent kernel with the -o barrier option
(this will be on by defult in 2.6.17).

cheers.
Comment 9 Martin Steigerwald 2006-04-13 05:44:40 UTC
Many thanks for your prompt answer.

Indeed, write cache should have been switched on, according to this

root@deepdance:~ -> hdparm -i /dev/hda | grep WriteCache
 AdvancedPM=yes: mode=0x80 (128) WriteCache=enabled

and this:

root@deepdance:~ -> cat /etc/hdparm.conf
[...]
/dev/hda {
       mult_sect_io = 16
       write_cache = on
       dma = on
       apm = 0
       acoustic_management = 128
       io32_support = 3
       keep_settings_over_reset = on
       interrupt_unmask = on
}

I know no way to query it directly.

I switched it to off immediately using hdparm -W 0 and set it to off in the 
hdparm.conf file as well.

I take your comment that this should be set to off with kernel 2.6.15 as well. 
Well I know SmartFilesystem for AmigaOS relies on data to be flushed to disc to 
be written immediately before the call returns in order to ensure a certain 
order for atomic writes. So this is the case with XFS as well?

Well ok, Documentation/block/barrier.txt sheds some light on this - just for 
other readers of this bug:

"There are four cases,

i. No write-back cache.  Keeping requests ordered is enough.

ii. Write-back cache but no flush operation.  There's no way to
gurantee physical-medium commit order.  This kind of devices can't to
I/O barriers.

iii. Write-back cache and flush operation but no FUA (forced unit
access).  We need two cache flushes - before and after the barrier
request."

Ok, so either barriers on or write cache off. Got this.

"-o barrier" is a mount option? I do not find it documentated anywhere.

Any hints on why it may worked quite well with 2.6.15 but I got three 
corruptions in one week with 2.6.16? Just coincidence or was there some write 
cache related changes in 2.6.16? During 2.6.15 time I had quite some 3D savage 
DRI driver lockups without any data loss.

I lowered severity to normal as it seems from your comment that it is a 
misconfiguration on my side. Feel free to raise it again it is seems approbiate 
to you. When according to hdparm -i /dev/hda the drive seems to default to 
WriteCache enabled it may have severe implications. At least the default 
setting should never burn any data --> kernel 2.6.17 with -o barrier on by 
default.

I will test kernel 2.6.16.4 with write cache off in hdparm and when that works, 
I may try with barrier on and write cache on - I am still a bit scared ATM. 
When both works that bug can be closed. A hint in the xfs.txt readme would 
still be in order IMHO until 2.6.17 is standard.

Hard drive in my laptop:

root@deepdance:~ -> smartctl -i /dev/hda
smartctl version 5.34 [i686-pc-linux-gnu] Copyright (C) 2002-5 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family:     Hitachi Travelstar 5K80 family
Device Model:     HTS548060M9AT00
Serial Number:    MRLB21L4G6G3DC
Firmware Version: MGBOA50A
User Capacity:    60.011.642.880 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   6
ATA Standard is:  ATA/ATAPI-6 T13 1410D revision 3a
Local Time is:    Thu Apr 13 14:18:21 2006 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

Regards,
Martin
Comment 10 Martin Steigerwald 2006-04-13 06:03:24 UTC
How do I query the drive whether write caching is on or off to make sure that 
hdparm -W 0 /dev/hda or my entry in /etc/hdparm.conf actually worked the way it 
should? hdparm -W does not seem to be able to query the drive status regarding 
write cache. Any hints?
Comment 11 Martin Steigerwald 2006-04-13 09:06:26 UTC
One additional thought: Can XFS detect common cases where write operation is 
dangerous (ATM only write cache on with barrier off comes to my mind) and do a 
readonly mount in that case issuing an error message explaining why it does so?

IMHO a filesystem should follow that better safe than sorry strategy where 
possible when it comes to the risk of data corruption. And at least here it 
shouldn't have any serious performance impact.
Comment 12 Martin Steigerwald 2006-05-04 04:18:39 UTC
I tested about one week in total with write cache off and kernel 2.6.16 (with 
2.6.16.4 and 2.6.16.11). I had no further XFS crashes.

I am now using 2.6.15 again, cause ALSA sound output doesnt work after suspend 
to disk with 2.6.16 (different bug I know).
Comment 13 Martin Steigerwald 2006-05-09 08:23:54 UTC
Just for information for readers of this bugreport:

<ul>
  <li>linux-xfs mailing list: <br /><a 
href="http://oss.sgi.com/archives/linux-xfs/2005-09/msg00044.html">TAKE 
912426 - write barrier support</a><br /><a 
href="http://oss.sgi.com/archives/linux-xfs/2005-11/msg00164.html">TAKE 
912426 - disable barriers by default</a></li>
  <li>linux-kernel mailing list: <br /> <a 
href="http://marc.theaimsgroup.com/?l=linux-kernel&m=113802736929484">[PATCH] 
enable XFS write barrier</a></li>
</ul>
Comment 14 Martin Steigerwald 2006-06-20 12:57:52 UTC
There is a FAQ entry about the write cache issue available now in the XFS faq 
from SGI:

http://oss.sgi.com/projects/xfs/faq.html#wcache

I like to try it with kernel 2.6.17 and then also with enabled write caches 
again, but I want to wait a little bit longer until I switch to 2.6.17, since 
its rather new at the moment. (At all its a productively used system, no test 
machine).
Comment 15 Krzysztof Rusocki 2006-06-20 13:29:32 UTC
Humm, can anybody explain or give some pointers about this write cache
effect?

I mean - I don't seem to understand how the corruption can occur
while having system *continuously* online.

I am assuming that after each 'event' fs was repaired and machine was rebooted.

[or maybe that isn't the case here?]

Thanks!
Comment 16 Martin Steigerwald 2006-06-21 00:17:48 UTC
There is some kernel documentation about the write barrier stuff. I have no 
depacked kernel 2.6.16 at hand here currently, but you should be able to find 
it by using find -name "*barrier*" or grep -ir "barrier" *. That explains the 
issue quite nicely as does the SGI FAQ I posted before.

But actually even after I reading it I do not understand this issue completely.

I do not understand why I got three crashes with 2.6.16 in one week while with 
2.6.15 it worked quite stable. It was not perfect with 2.6.15, but at least I 
only got XFS corruption rarely after a DRI savage driver crash or when suspend 
to disk did not work correctly - when the machine was not online as you say. 
Actually XFS survived most of those crashes nicely. With 2.6.16 at least once - 
when I used kdissert - the kernel just went down while I was using the machine 
regurlarily (no 3D stuff and no suspend to disk issues). Even when kdissert / 
KDE somehow managed to crash X.org the kernel should still be alive and X.org 
should have been restarted. So either kernel 2.6.16 was a lot more unstable 
than 2.6.15 in the beginning or XFS had an issue with enabled write cache that 
happened while it was running and not only on power outages and kernel crashes. 
I had no kernel crashes while regular use with 2.6.16 when I disabled write 
cache what may point at the second alternative.

I repaired the filesystem after each event either by using xfs_repair or when 
damage was to big by replaying a backup via rsync.

Anyway I think its best to test with 2.6.17 again with barrier functionality 
and write cache enabled. I will do so once 2.6.17 matured a bit more and I do 
not hear about new issues, cause this is a production machine and I loose quite 
some time on each filesystem crash that happens.
Comment 17 Martin Steigerwald 2006-07-16 02:24:53 UTC
Hello, Ok, I had a three week test period with 2.6.17.1 + the xfs-fix for 
kernel bug #6757 (that one is really needed and IMHO should go into a stable 
kernel patch as soon as possible!) + sws2 2.2.6. One week with disabled write 
caches, one week with enabled write caches and barrier mount option mentioned 
in /etc/fstab, one week with enabled write caches and barrier mount option not 
mentioned in /etc/fstab thus specifically testing whether its really the 
default now. 

No problems. xfs_check on the root partition showed three agi unlinked bucket 
that xfs_repair fixed but from what I know these are no real defects. If that 
shouldn't happen tough, something till needs to be fixed. Please tell me if 
thats the case.

I added to that three tests with switching off the computer while writing data 
to a XFS partition:

1) rsync -a /usr/src /destination/partition
2) ddrescue /dev/hda1 /destination/partition
3) 1 + 2 + rm -rf /that/usr/src/directory-from-test-one. The rm job was 
completed a second before I switched off the laptop, but I am sure that the 
other jobs were still running

Result: No problems. No single line of output in xfs_check after each of the 
three tests.

So I am pretty much convinced that XFS is working really stable now with write 
caches given that the patch from kernel bug #6757 which is unrelated to the 
write cache issue is applied.

Thank you, guys! 

Regards, Martin