Description of problem: On s390x, I hit a xfs/490 failure (can't reproduce it on x86_64). By manually debuging, I find: # mkfs.xfs -f -m finobt=0 /dev/loop1 meta-data=/dev/loop1 isize=512 agcount=4, agsize=786496 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=0, sparse=1, rmapbt=0 = reflink=1 data = bsize=4096 blocks=3145984, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=2560, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 # mount /dev/loop1 /mnt/testarea/scratch/ # mkdir /mnt/testarea/scratch/dir # xfs_io -fc "pwrite 0 4096" -c fsync /mnt/testarea/scratch/dir/testfile wrote 4096/4096 bytes at offset 0 4 KiB, 1 ops; 0.0000 sec (150.240 MiB/sec and 38461.5385 ops/sec) # stat -c %i /mnt/testarea/scratch/dir/testfile 132 # umount /dev/loop1 # _scratch_xfs_db -c "convert inode 132 agno" 0x0 (0) # _scratch_xfs_get_metadata_field "recs[1].freecount" "agi 0" "addr root" -197 ]# xfs_db -c "agi 0" -c "addr root" -c "print recs[1]" /dev/loop1 recs[1] = [startino,holemask,count,freecount,free] 1:[128,0,64,-197,0xffffffffffffffe0] Version-Release number of selected component (if applicable): kernel 4.18 xfsprogs 4.19.0-rc0 How reproducible: 100% on s390x with loop device (at least from my testing) Steps to Reproduce: run xfs/490 on s390x Actual results: as above Expected results: test pass Additional info: I think it's not a kernel problem, the negative number maybe not real on disk. Due to the SCRATCH_DEV still can be mounted without errors: [root@ibm-z-110 xfstests]# mount /dev/loop1 /mnt/testarea/scratch [root@ibm-z-110 xfstests]# dmesg|tail [ 7289.790976] XFS (loop1): Mounting V5 Filesystem [ 7289.796601] XFS (loop1): Ending clean mount And it's not reproducible on x86_64: # xfs_db -c "agi 0" -c "addr root" -c "print recs[1]" /dev/loop1 recs[1] = [startino,holemask,count,freecount,free] 1:[128,0,64,59,0xffffffffffffffe0]
Created attachment 279077 [details] xfs metadump
On Wed, Oct 17, 2018 at 11:20:42AM +0000, bugzilla-daemon@bugzilla.kernel.org wrote: > # _scratch_xfs_get_metadata_field "recs[1].freecount" "agi 0" "addr root" > -197 > ]# xfs_db -c "agi 0" -c "addr root" -c "print recs[1]" /dev/loop1 > recs[1] = [startino,holemask,count,freecount,free] > 1:[128,0,64,-197,0xffffffffffffffe0] ..... > And it's not reproducible on x86_64: > # xfs_db -c "agi 0" -c "addr root" -c "print recs[1]" /dev/loop1 > recs[1] = [startino,holemask,count,freecount,free] > 1:[128,0,64,59,0xffffffffffffffe0] -197 = -(256 - 59) This looks like a sign extension problem in the xfs_db code. s390 is a big endian system, right? Cheers, Dave.
Yep, does this fix it? diff --git a/db/btblock.c b/db/btblock.c index cbd2990..5a5b061 100644 --- a/db/btblock.c +++ b/db/btblock.c @@ -513,7 +513,7 @@ const field_t inobt_sprec_flds[] = { { "holemask", FLDT_UINT16X, OI(ROFF(ir_u.sp.ir_holemask)), C1, 0, TYP_NONE }, { "count", FLDT_UINT8D, OI(ROFF(ir_u.sp.ir_count)), C1, 0, TYP_NONE }, - { "freecount", FLDT_INT8D, OI(ROFF(ir_u.sp.ir_freecount)), C1, 0, + { "freecount", FLDT_UINT8D, OI(ROFF(ir_u.sp.ir_freecount)), C1, 0, TYP_NONE }, { "free", FLDT_INOFREE, OI(ROFF(ir_free)), C1, 0, TYP_NONE }, { NULL }
(In reply to Eric Sandeen from comment #3) > Yep, does this fix it? Yes, this's helpful. # xfs_db -c "agi 0" -c "addr root" -c "print recs[1]" /dev/loop1 recs[1] = [startino,holemask,count,freecount,free] 1:[64,0,64,59,0xffffffffffffffe0] > > diff --git a/db/btblock.c b/db/btblock.c > index cbd2990..5a5b061 100644 > --- a/db/btblock.c > +++ b/db/btblock.c > @@ -513,7 +513,7 @@ const field_t inobt_sprec_flds[] = { > { "holemask", FLDT_UINT16X, OI(ROFF(ir_u.sp.ir_holemask)), C1, 0, > TYP_NONE }, > { "count", FLDT_UINT8D, OI(ROFF(ir_u.sp.ir_count)), C1, 0, TYP_NONE > }, > - { "freecount", FLDT_INT8D, OI(ROFF(ir_u.sp.ir_freecount)), C1, 0, > + { "freecount", FLDT_UINT8D, OI(ROFF(ir_u.sp.ir_freecount)), C1, 0, > TYP_NONE }, > { "free", FLDT_INOFREE, OI(ROFF(ir_free)), C1, 0, TYP_NONE }, > { NULL }
So zorro correctly points out that the big vs little endian certainly should not matter for this u8. What does matter is the signed type, because getbitval is doing tricks to try to handle sign extension and it does it differently for big vs. little endian: if (getbit_l(p, bit + i)) { /* If the last bit is on and we care about sign * bits and we don't have a full 64 bit * container, turn all bits on between the * sign bit and the most sig bit. */ /* handle endian swap here */ #if __BYTE_ORDER == LITTLE_ENDIAN if (i == 0 && signext && nbits < 64) rval = (~0ULL) << nbits; rval |= 1ULL << (nbits - i - 1); #else if ((i == (nbits - 1)) && signext && nbits < 64) rval |= ((~0ULL) << nbits); rval |= 1ULL << (nbits - i - 1); #endif Switching it to FLDT_UINT8D makes "signext" false so none of this happens, but that's papering over the underlying bug with signed types. The bug seems to be the test for if ((i == (nbits - 1)) ...) - this is testing the last / rightmost bit in the number, which is /not/ the MSB. But I cannot seem to wrap my head around the right way to fix it, yet.