Bug 5821 - XFS unreliable on Alpha (64-bit machine)
Summary: XFS unreliable on Alpha (64-bit machine)
Status: CLOSED PATCH_ALREADY_AVAILABLE
Alias: None
Product: File System
Classification: Unclassified
Component: XFS (show other bugs)
Hardware: i386 Linux
: P2 high
Assignee: XFS Guru
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-01-03 23:19 UTC by R
Modified: 2006-09-22 09:18 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.15
Tree: Mainline
Regression: ---


Attachments

Description R 2006-01-03 23:19:53 UTC
Most recent kernel where this bug did not occur: Unknown. Nonexistent?   
Distribution: Gentoo Linux 2005.x  
Hardware Environment:  
CPU: DEC 21164 EV5 at 266Mhz  
System: DEC 21171 Alcor  
IEEE1394 card: Creative Labs SB Audigy FireWire Port  
Firewire to IDE bridge: Oxford 911  
Software Environment: Linux 2.6.15, mount 2.12r, xfsprogs 2.6.13 
Partition requiring log replay on the IDE hard drive 
    
Problem Description: 
When I attached and tried to mount the drive, I got: 
    
ieee1394: sbp2: Logged into SBP-2 device    
ieee1394: Node 0-00:1023: Max speed [S400] - Max payload [2048]    
  Vendor: WDC WD80  Model: 0JB-00JJA0        Rev: 05.0    
  Type:   Direct-Access-RBC                  ANSI SCSI revision: 04    
SCSI device sdb: 156301488 512-byte hdwr sectors (80026 MB)    
sdb: asking for cache data failed    
sdb: assuming drive cache: write through    
SCSI device sdb: 156301488 512-byte hdwr sectors (80026 MB)    
sdb: asking for cache data failed    
sdb: assuming drive cache: write through    
sdb: sdb1 sdb2 sdb3 sdb4    
sd 4:0:0:0: Attached scsi disk sdb    
XFS mounting filesystem sdb1    
Starting XFS recovery on filesystem: sdb1 (logdev: internal)    
Unable to handle kernel paging request at virtual address b6db69da5323dc00    
mount(8951): Oops 0    
pc = [<fffffc000048fbc8>]  ra = [<fffffc000049088c>]  ps = 0000    Not tainted    
pc is at xlog_recover_commit_trans+0x4b8/0x19c0    
ra is at xlog_recover_commit_trans+0x117c/0x19c0    
v0 = b6db69da5323dc00  t0 = fffffc0000000000  t1 = 0000aa33f90ee000    
t2 = 3e9c4e70ee000000  t3 = fffffc003e8a7e58  t4 = fffffc0000790000    
t5 = fffffc003e8a7e58  t6 = fffffc003e8a7d28  t7 = fffffc003b9d0000    
s0 = fffffc003c82e5e0  s1 = fffffc003e8a7cf0  s2 = 0000000000000000    
s3 = 0000000000000000  s4 = fffffc003f279800  s5 = 0000000000000000    
s6 = b6db69da5323dc00    
a0 = fffffc003e8a7cf0  a1 = 0000000000001c00  a2 = 0000000000000000    
a3 = 0000000000000000  a4 = 0000000000004001  a5 = fffffc0000700b98    
t8 = 0000000000000000  t9 = 000000410d9b4090  t10= 8200000000000000    
t11= 0000000000000000  pv = fffffc00004ab4d0  at = 000000000000007f    
gp = fffffc000075d400  sp = fffffc003b9d36c8    
Trace:    
[<fffffc00004915e0>] xlog_recover_process_data+0x510/0x670    
[<fffffc00004a8ea8>] kmem_alloc+0x98/0x190    
[<fffffc00004915e0>] xlog_recover_process_data+0x510/0x670    
[<fffffc00004925a8>] xlog_do_recovery_pass+0x4a8/0x7e0    
[<fffffc00004929dc>] xlog_recover+0xfc/0x2e0    
[<fffffc000049299c>] xlog_recover+0xbc/0x2e0    
[<fffffc000048a97c>] xfs_log_mount+0x43c/0x6d0    
[<fffffc0000495444>] xfs_mountfs+0xc14/0x11b0    
[<fffffc00004847b0>] xfs_ioinit+0x50/0x70    
[<fffffc00004ad588>] pagebuf_iostart+0xf8/0x130    
[<fffffc00004929dc>] xlog_recover+0xfc/0x2e0    
[<fffffc000049299c>] xlog_recover+0xbc/0x2e0    
[<fffffc000048a97c>] xfs_log_mount+0x43c/0x6d0    
[<fffffc0000495444>] xfs_mountfs+0xc14/0x11b0    
[<fffffc00004847b0>] xfs_ioinit+0x50/0x70    
[<fffffc00004ad588>] pagebuf_iostart+0xf8/0x130    
[<fffffc000049386c>] xfs_readsb+0x13c/0x6c0    
[<fffffc00004adba0>] xfs_setsize_buftarg_flags+0xc0/0x190    
[<fffffc00004847b0>] xfs_ioinit+0x50/0x70    
[<fffffc000049d884>] xfs_mount+0xa14/0xb70    
[<fffffc00004b617c>] vfs_mount+0x3c/0x60    
[<fffffc00004b5d30>] linvfs_fill_super+0x0/0x3d0    
[<fffffc00004b5e38>] linvfs_fill_super+0x108/0x3d0    
[<fffffc0000383cdc>] get_sb_bdev+0x1cc/0x2b0    
[<fffffc0000382c00>] sget+0x3a0/0x420    
[<fffffc0000383bac>] get_sb_bdev+0x9c/0x2b0    
[<fffffc00003844f4>] sb_set_blocksize+0x34/0xa0    
[<fffffc0000383cbc>] get_sb_bdev+0x1ac/0x2b0    
[<fffffc00004b6120>] linvfs_get_sb+0x20/0x40    
[<fffffc0000384158>] do_kern_mount+0x88/0x1a0    
[<fffffc00003a17d8>] do_mount+0x5c8/0x970    
[<fffffc00003a2004>] sys_mount+0xc4/0x160    
[<fffffc0000316de0>] handle_irq+0x140/0x1c0    
[<fffffc0000317400>] do_entInt+0x110/0x180    
[<fffffc0000355d88>] __alloc_pages+0x68/0x380    
[<fffffc00003560e0>] __get_free_pages+0x40/0xe0    
[<fffffc00003a10a8>] copy_mount_options+0x48/0x1b0    
[<fffffc00003a1fd8>] sys_mount+0x98/0x160    
[<fffffc0000311354>] entSys+0xa4/0xc0    
    
Code: 47ff040c  482052d1  44409002  e4400330  a4200098  4031040f <a02f0000>    
48207621     
    
And then mount segfaults. Subsequent attempts to do things to sdb1 would cause    
the process trying to do them to hang, including running the shutdown script.    
    
Then I try putting the drive on an x86 machine, and I get:    
Starting XFS recovery on filesystem: sdb1 (dev: sdb1)    
Ending XFS recovery on filesystem: sdb1 (dev: sdb1)    
XFS mounting filesystem sdb1 
And now someone else is using the drive. 
    
Without mounting, xfs_check and such work, but xfs_repair doesn't want to do   
anything for fear of destroying important stuff in the log. I forgot to try   
fsck, but other people say that doesn't work on 64-bit machines (Sparc), 
either. 
 
With or without trying to mount the XFS partition, mounting the VFAT partition 
happens without incident. 
    
Steps to reproduce:     
1. Get a drive with XFS. 
2. Try to get it to be unmounted uncleanly so it requires log replays and such. 
3. See it fail to work on 64-bit machine without sending it to a 32-bit machine 
first.
Comment 1 Chris Wedgwood 2006-09-20 21:50:36 UTC
log replay isn't endian and/or word-size independant, if the journal is dirty
from i386 it won't replay clean on on alpha or similar

is that what is going on here?
Comment 2 Eric Sandeen 2006-09-21 07:40:56 UTC
Hm, well, it still shouldn't oops, it should recognize an invalid log format if
one is found.

However, there were some bugs in this area which have been fixed.  See
http://oss.sgi.com/archives/xfs/2006-05/msg00051.html

I can't remember for sure if this led to an oops (Tim?) (feel free to search the
archives)

Can the original reporter try with a recent kernel?  2.6.15 predates that fix.

Also Tim says this:

> Log replay is certainly not endian independent but it can handle
> different word-size machines nowadays (didn't in the past).
> So you can go between i386 and x86_64 for example, with a dirty log.
> The log formats will be different between the two for some items
> but in recovery with a newer kernel we do conversion on the fly.

> It certainly does sound like that is the issue here, Chris.
> The back trace is just like a typical different log format problem.
Comment 3 Timothy Shimmin 2006-09-21 17:08:18 UTC
(Oops. Meant to add to the bug instead of just email.)

Good point, Eric.
If it's an endian difference for a dirty log then it will report it 
and fail to mount.
If it's a word size difference for a dirty log with an old kernel then 
it almost certainly will crash will a similar backtrace to what was seen here.
That shouldn't happen with recent xfs.

--Tim
Comment 4 R 2006-09-22 00:07:13 UTC
> Can the original reporter try with a recent kernel?

Regrettably, I can't. My AlphaStation's power supply gave out, my other 
AlphaStation's CPU board is giving hardware error codes, and their parts are 
not interchangeable.
Comment 5 XFS Guru 2006-09-22 09:18:57 UTC
In the absence of further testing, I think it makes sense to assume that this is
the problem that has been fixed in the upstream log code.

Note You need to log in before you can comment on or make changes to this bug.