Bug 2155 - I/O ( filesystem ) sync issue
Summary: I/O ( filesystem ) sync issue
Status: REJECTED INSUFFICIENT_DATA
Alias: None
Product: File System
Classification: Unclassified
Component: XFS (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: XFS Guru
URL:
Keywords:
: 2336 (view as bug list)
Depends on:
Blocks: 2336
  Show dependency tree
 
Reported: 2004-02-19 21:58 UTC by Cheng
Modified: 2007-02-17 12:00 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.3
Subsystem:
Regression: ---
Bisected commit-id:


Attachments

Description Cheng 2004-02-19 21:58:28 UTC
Distribution: 
Debian Sarge

Hardware Environment: 
Whitebox P4 2.4 Xeon server w/ 1G ECC Ram
Tyan S2723 Motherboard
3ware 7506 IDE Raid card

Software Environment: 
Plain Kernel 2.6.3
XFS
Raid 5
LVM 1.0

Problem Description: 
A file copy operarion was performed before system reboot.
cp filea fileb

filea size is less than 2k

Right after the above operation, I typed "reboot" to reboot system.
After boot up, fileb size is correct, but it's totally empty.

It seems likely XFS didn't performe sync operation before umount.

Steps to reproduce:
Tried three times, same result.
Comment 1 XFS Guru 2004-02-19 22:05:55 UTC
Was the file in question on the root filesystem?  Can you verify that
it is only a problem on /, and not on other filesystems?  This may
be a problem with the remount,readonly code that the system goes
through for root (/) when shutting down.
Comment 2 Cheng 2004-02-21 23:01:22 UTC
The problem definitely occured on XFS partition. My all partitions run on XFS 
except /boot running on reiserfs.

I'll take chance to try that on /boot to see if it's directly related to XFS.
Comment 3 Cheng 2004-02-21 23:13:41 UTC
My another problem is 

Couple of days ago, I installed debsums(debsums is a debian tool like tripwire) 
on my sarge box, debsums reported some files under /usr are checksum mismatch.
(I am sure system is not compromised)
I reinstalled those damaged file and keep observing.

Since then, everyday, there will be one or two files got reported as
checksum mismatch. I have run diff to double checked that. 
Those files are still usable tho.

And if I just leave those changed files alone, sometimes, couple of hours
later, debsums might report no error.
There is any error message about this dmesg.

I don't think it's a hardware problem.

And probably this problem has some relation with the first one.



The following files are checksum mismatch mostly. Most are system files.


debsums: checksum mismatch links file usr/bin/links.main
debsums: checksum mismatch dpkg file usr/bin/dpkg
debsums: checksum mismatch libglib2.0-0 file usr/lib/libglib-2.0.so.0.200.3
debsums: checksum mismatch sysstat file usr/bin/iostat
debsums: checksum mismatch libglib2.0-0 file usr/lib/libglib-2.0.so.0.200.3
debsums: checksum mismatch python2.3 file usr/lib/python2.3/Cookie.py
Comment 4 Cheng 2004-02-21 23:15:16 UTC
s/There is any error message about this dmesg./There is no error message about 
this in demsg./
Comment 5 Emilio Gargiulo 2004-03-20 13:04:48 UTC
*** Bug 2336 has been marked as a duplicate of this bug. ***
Comment 6 Emilio Gargiulo 2004-03-20 13:07:07 UTC
The problem seems related to all XFS filesystems on LVM, LVM2 and RAID, root and
non-root.
Comment 7 Cheng 2004-03-21 01:58:13 UTC
The comment #3 seems to be a different issue, sth that is related to "read".

I've tried #3 on kernel 2.4.25, same problem.
Comment 8 Emilio Gargiulo 2004-03-22 13:38:42 UTC
Can I try to isolate the bug code with debugging on or with other way?
Comment 9 Cheng 2004-03-23 02:29:18 UTC
Hi, Emilio

Do you mean "Can you try to..."?

Comment 10 Emilio Gargiulo 2004-03-24 15:07:50 UTC
Hi, Cheng
I will offer my capacity of reproduce the issue for identify the code that
generate this issue.
Perhaps if you have already an idea about it, please tell us.
I think that there is a relation between gcc and this bug. If i use gcc 3.2.3 i
have similar issues with 2.4.25, but not the same. If i use gcc 3.3.1 the 2.4.25
works fine.
With 2.6.x xfs on LVM or RAID is data-losser, with all gcc 3.X. I have not tried
gcc 2.9X.XX.
I'm not a hard coder, but perhaps i will be useful for XFS developer to isolate
the wrong code....
Thanks
Comment 11 Cheng 2004-04-01 21:42:26 UTC
Hi, Emilio 

Sorry for my delayed response.

The kernel that I encountered I/O sync problem (beginning message submitted in 
this bugreport) was compiled by GCC 3.3.3.


However, for the comment #3, it seems to be an independent issue. I've tried it 
with 2.4.25 and XFS/Reiserfs/Ext3 with or without LVM, the problem can be 
reproduced under all these conditions. 

Here is some interesting message you might be interested, and it happens on my 
box rulelessly and daily.

############################################################################
tux:/etc/init.d# debsums -s
debsums: checksum mismatch python2.3 file usr/lib/python2.3/Cookie.py

#debsums is a debian file integrity-check tool like tripwire
#Then I copied an intact Cookie.py from another system to diff

tux:/etc/init.d# diff /usr/lib/python2.3/Cookie.py /home/geek/Cookie.py 
467,469c467,468
<  uration state."""
<         if self._defaults:
<             fp.writ                        self.key, repr(self.value) )
---
>         return '<%s: %s=%s>' % (self.__class__.__name__,
>                                 self.key, repr(self.value) )

#Look at the first line:    uration state."""

#Let's run grep -r 'uration state.\"\"\"' in /usr/lib/python2.3/

tux:/usr/lib/python2.3# grep -r 'uration state.\"\"\"' * 
ConfigParser.py:        """Write an .ini-format representation of the 
configuration state."""
Cookie.py: uration state."""

#The mismatched content is actually some string from another file in the same 
direcotry. 

########################################################################

Really odd, huh?


I am not sure if the above two problems has any relation between them, or maybe 
I'd better submit the comment #3 as an indepdent bug?
Comment 12 Emilio Gargiulo 2004-04-02 09:40:34 UTC
I think also thath the #3 case must be classified as an indipendent bug.
This because non seems related to XFS sync with kernel 2.6.X.

Now we need the voice of XFS Guru and other LVM and RAID Guru, because I Have
not sufficient skill to find the wrong code.

Hi, XFS Guru's are you alive?
Thanks
Emilio Gargiulo
Comment 13 Adrian Bunk 2006-12-07 07:46:26 UTC
Is this issue still present in recent 2.6 kernels?
Comment 14 Adrian Bunk 2007-02-17 12:00:19 UTC
Please reopen this bug if it's still present with kernel 2.6.20.

Note You need to log in before you can comment on or make changes to this bug.