Bug 202127
Summary: | cannot mount or create xfs on a 597T device | ||
---|---|---|---|
Product: | File System | Reporter: | Manhong Dai (manhongdai) |
Component: | XFS | Assignee: | FileSystem/XFS Default Virtual Assignee (filesystem_xfs) |
Status: | RESOLVED PATCH_ALREADY_AVAILABLE | ||
Severity: | normal | CC: | bugzilla, sandeen, sandeen |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 4.20.0-arch1-1-ARCH #1 SMP PREEMPT Mon Dec 24 03:00:40 UTC 2018 x86_64 GNU/Linux | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: | output of "mkfs -t xfs -f /dev/sda" |
[root@watsons-s1 ~]# grep sda /proc/partitions 8 0 597636415488 sda By the way, the xfs filesystem works without any issue under the environment below [root@watsons-s1 ~]# mkfs.xfs -V mkfs.xfs version 4.17.0 [root@watsons-s1 ~]# uname -a Linux watsons-s1.mbni.org 4.17.5-1-ARCH #1 SMP PREEMPT Sun Jul 8 17:27:31 UTC 2018 x86_64 GNU/Linux > = sunit=256 swidth=64 blks
It's very strange to have automatically arrived at an impossible geometry with swidth < sunit.
Can you provide the output of:
# blockdev --getsz --getsize64 --getss --getpbsz --getiomin --getioopt /dev/sda
please?
[root@watsons-s1 ~]# blockdev --getsz --getsize64 --getss --getpbsz --getiomin --getioopt /dev/sda 1195272830976 611979689459712 512 4096 1048576 262144 On Thu, Jan 03, 2019 at 10:18:56PM +0000, bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=202127 > > --- Comment #3 from daimh@umich.edu --- > [root@watsons-s1 ~]# blockdev --getsz --getsize64 --getss --getpbsz > --getiomin --getioopt /dev/sda > 1195272830976 > 611979689459712 > 512 > 4096 > 1048576 > 262144 Ok, so iomin=1MB and ioopt=256k. That's an invalid block device configuration. If this is coming from hardware RAID (e.g. via a scsi code page) then this is a firmware bug. If this is coming from a softare layer (e.g. lvm, md, etc) then it may be a bug in one of those layers. IOWs, we need to know what your storage stack configuration is and what hardware underlies /dev/sda. Can you attach the storage stack and hardware information indicated in this link for us? http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F Eric, we probably need to catch this in mkfs when validating blkid information - the superblock verifier is too late to be catching bad config info like this.... Cheers, Dave. /dev/sda is a hardware raid. lspci -vvv output is below. By the way, /dev/sda has been working great with kernel 4.17.5-1-ARCH and mkfs.xfs version 4.17.0. I can create, mount and copy 200+ T data without any glitches. Further, If I downgrade linux kernel and xfsprogs, /dev/sda can work. 02:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS-3 3316 [Intruder] (rev 01) Subsystem: LSI Logic / Symbios Logic MegaRAID SAS 9361-16i Physical Slot: 3 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 32 bytes Interrupt: pin A routed to IRQ 26 NUMA node: 0 Region 0: I/O ports at 6000 [size=256] Region 1: Memory at c7240000 (64-bit, non-prefetchable) [size=64K] Region 3: Memory at c7200000 (64-bit, non-prefetchable) [size=256K] Expansion ROM at c7100000 [disabled] [size=1M] Capabilities: [50] Power Management version 3 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [68] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 4096 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq- RlxdOrd- ExtTag+ PhantFunc- AuxPwr- NoSnoop+ MaxPayload 256 bytes, MaxReadReq 512 bytes DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend- LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM L0s, Exit Latency L0s <2us ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+ LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 8GT/s (ok), Width x8 (ok) TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range BC, TimeoutDis+, LTR-, OBFF Not Supported AtomicOpsCap: 32bit- 64bit- 128bitCAS- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled AtomicOpsCtl: ReqEn- LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+ EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest+ Capabilities: [d0] Vital Product Data Not readable Capabilities: [a8] MSI: Enable- Count=1/1 Maskable+ 64bit+ Address: 0000000000000000 Data: 0000 Masking: 00000000 Pending: 00000000 Capabilities: [c0] MSI-X: Enable+ Count=97 Masked- Vector table: BAR=1 offset=0000e000 PBA: BAR=1 offset=0000f000 Capabilities: [100 v2] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn- MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap- HeaderLog: 04000001 0018000f 02010000 89faae40 Capabilities: [1e0 v1] Secondary PCI Express <?> Capabilities: [1c0 v1] Power Budgeting <?> Capabilities: [148 v1] Alternative Routing-ID Interpretation (ARI) ARICap: MFVC- ACS-, Next Function: 0 ARICtl: MFVC- ACS-, Function Group: 0 Kernel driver in use: megaraid_sas Kernel modules: megaraid_sas What do you get for 'MegaCli64 -AdpAllInfo -aAll' and parse for firmware revision? Current on the website is 24.22.0-0034 dated Oct 12 2018. Upgrading it carries some risk *shrug* there might be a changelog you can download (now at Broadcom) and see if there's anything related to this problem that's been fixed. dchinner: Agreed, we need to validate this and refuse to proceed with nonsense daimh: I'm curious, what geometry does mkfs.xfs (or xfs_info) give you when you use xfsprogs-4.17.0? Ok looks like firmware package 24.22.0-0034 translates to firmware 4.740.00-8394. I can't parse this file by only giving it 2 minutes read, so I have no idea what firmware has this change but I found this: SCGCQ01150173 - (Port_Complete) - Request for NVdata change on 9361-8i to make the minimum stripe size 16K (instead of 64K) But you have a 16i, not 8i. OK whatever. I would just flash it with the latest if it doesn't already have it. And if it's still a problem file a bug with Broadcom and reference this bug, specifically comment 4. [root@watsons-s1 ~]# /opt/MegaRAID/storcli/storcli /c0 show | head -n 30 Generating detailed summary of the adapter, it may take a while to complete. Controller = 0 Status = Success Description = None Product Name = AVAGO MegaRAID SAS 9361-16i Serial Number = SK82367083 SAS Address = 500062b201dbbcc0 PCI Address = 00:02:00:00 System Time = 01/03/2019 18:17:05 Mfg. Date = 06/06/18 Controller Time = 01/03/2019 23:17:05 FW Package Build = 24.19.0-0047 BIOS Version = 6.34.01.0_4.19.08.00_0x06160200 FW Version = 4.720.00-8218 Driver Name = megaraid_sas Driver Version = 07.706.03.00-rc1 Current Personality = RAID-Mode Vendor Id = 0x1000 Device Id = 0xCE SubVendor Id = 0x1000 SubDevice Id = 0x9371 Host Interface = PCI-E Device Interface = SAS-12G Bus Number = 2 Device Number = 0 Function Number = 0 Drive Groups = 1 While I am downgrading the server to get the xfs_info in a minute, I don't want to upgrade the firmware, because it worked beautifully under the two old versions of Linux and xfsprogs, and I don't want to get fired. :) On Thu, Jan 03, 2019 at 10:58:41PM +0000, bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=202127 > > --- Comment #5 from daimh@umich.edu --- > /dev/sda is a hardware raid. lspci -vvv output is below. By the way, /dev/sda > has been working great with kernel 4.17.5-1-ARCH and mkfs.xfs version 4.17.0. > I > can create, mount and copy 200+ T data without any glitches. Further, If I > downgrade linux kernel and xfsprogs, /dev/sda can work. Sure, that's because newer kernels and tools are much more stringent about validity checking the on-disk information. And, in this case, newer tools have found a validity problem that the older tools and kernel didn't. You can use xfs_db to fix the broken alignment in the superblock, but I'd like to get to the bottom of where the problem is coming from first. > 02:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS-3 3316 > [Intruder] (rev 01) Ok, so it's an lsi/broadcom 3316 hardware RAID controller. Which means this is most likely a firmware bug. Can you update the raid controller to the latest firmware and see if the block dvice still reports the same iomin/ioopt parameters? If so, you need to talk to your vendor about getting their hardware bug fixed and ensure their QA deficiencies are addressed, then use xfs_db to rewrite the stripe unit/stripe width to valid values so you can continue to use the filesystem on modern kernels. Cheers, Dave. [root@watsons-s1 ~]# uname -a Linux watsons-s1.mbni.org 4.17.5-1-ARCH #1 SMP PREEMPT Sun Jul 8 17:27:31 UTC 2018 x86_64 GNU/Linux [root@watsons-s1 ~]# mkfs.xfs -V mkfs.xfs version 4.17.0 [root@watsons-s1 ~]# mkfs -t xfs -f /dev/sda meta-data=/dev/sda isize=512 agcount=557, agsize=268434944 blks = sectsz=4096 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=0 = reflink=0 data = bsize=4096 blocks=149409103872, imaxpct=1 = sunit=256 swidth=64 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=521728, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 I will thinking about the firmware upgrade tomorrow. :) (In reply to daimh from comment #12) > [root@watsons-s1 ~]# mkfs.xfs -V > mkfs.xfs version 4.17.0 > [root@watsons-s1 ~]# mkfs -t xfs -f /dev/sda Thanks. That's what I suspected, i.e. older xfsprogs just didn't validate or notice this problem, but I was too lazy to check the history. ;) Thanks for the test and the info. I just upgraded the firmware from 24.19.0-0047 to the latest 24.22.0-0034, also upgrade the kernel to 4.20.0-arch1-1-ARCH with xfsprogs version to 4.19.0, the problem persists. [root@watsons-s1 ~]# /opt/MegaRAID/storcli/storcli /c0 show all | grep -i firmware Firmware Package Build = 24.22.0-0034 Firmware Version = 4.740.00-8394 Support PD Firmware Download = No [root@watsons-s1 ~]# uname -a Linux watsons-s1.mbni.org 4.20.0-arch1-1-ARCH #1 SMP PREEMPT Mon Dec 24 03:00:40 UTC 2018 x86_64 GNU/Linux [root@watsons-s1 ~]# mkfs.xfs -V mkfs.xfs version 4.19.0 [root@watsons-s1 ~]# blockdev --getsz --getsize64 --getss --getpbsz --getiomin --getioopt /dev/sda 1195272830976 611979689459712 512 4096 1048576 262144 I have a few back and forth with Broadcom support. He checked some logs generated by their standard error check script and the logs looks fine to him. He asked to run mkfs.ext4 on the block device, it failed with an error message "mkfs.ext4: Size of device (0x22c97a0000 blocks) /dev/sda too big to create a filesystem using a blocksize of 4096." Then he asked me if I can create XFS and EXT4 on a smaller partition. I fdisked the block device to two smaller partitions, one is 2T the other is 50T. mkfs.ext4 worked on both partition, but mkfs.xfs still reports the same error. Then I was told to contact you guys again. Here is his latest reply. "I think it is a xfs issue here. Can you go back to the developer ? What were the changes from 4.17 to 4.19 ? If the problem is caused by our firmware, ext4 would have failed." Broadcom is simply wrong - ext4 doesn't care about or use the stripe geometry like xfs does. But that is beside the point, because: It is not valid to have a preferred I/O size smaller than the minimum I/O size - this should be obvious to the vendor. We can detect this at mkfs time and error out or ignore the bad values, but there can be no debate about whether the hardware is returning nonsense. It /is/ returning nonsense. That's a firmware bug. Pose the question to broadcom: "How can the preferred IO size be less than the minimum allowable IO size? Because that's what the hardware it telling us. -Eric On Fri, Jan 04, 2019 at 10:02:58PM +0000, bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=202127 > > --- Comment #16 from Eric Sandeen (sandeen@sandeen.net) --- > Broadcom is simply wrong - ext4 doesn't care about or use the stripe geometry > like xfs does. But that is beside the point, because: > > It is not valid to have a preferred I/O size smaller than the minimum I/O > size > - this should be obvious to the vendor. We can detect this at mkfs time and > error out or ignore the bad values, but there can be no debate about whether > the hardware is returning nonsense. It /is/ returning nonsense. That's a > firmware bug. > > Pose the question to broadcom: > > "How can the preferred IO size be less than the minimum allowable IO size? Just to clarify, the "minio" being reported here is not the "minimum allowable IO size". The minimum allowed IO size is the logical sector/block size of the device. "minimum_io_size" is badly named - it's actually the smallest IO size alignment that allows for /efficient/ IO operations to be performed by the device, and that's typically very different to logical_block_size of the device. e.g: $ cat /sys/block/sdf/queue/hw_sector_size 512 $ cat /sys/block/sdf/queue/logical_block_size 512 $ cat /sys/block/sdf/queue/physical_block_size 4096 $ cat /sys/block/sdf/queue/minimum_io_size 4096 $ So, we can do 512 byte sector IOs to this device, but it's not efficient due to it having a physical 4k block size. i.e. ti requires a RMW cycle to do a 512 byte write. IOWs, a 4k IO (minimum_io_size) will avoid physical block RMW cycles as the physical block size of the storage is 4k. That's what "minimum efficient IO size" means. For a RAID5/6 lun, this is typically the chunk size, as many RAID implementations can do single chunk aligned writes efficiently via partial stripe recalculation without needing RMW cycles. If the write partially overlaps chunks, then RMW cycles are required for RAID recalc, hence setting the RAID chunk size as the "minimum_io_size" makes sense. However, a device may not be efficient and reach saturation when fed lots of minimum_io_size requests. That's where optimal_io_size comes in - a lot of SSDs out there have an optimal IO size in the range of 128-256KB because they can't reach max throughput when smaller IO sizes are used (iops bound). i.e. the optimal IO size is the size of the Io that will allow the entire bandwidth of the device to be effectively utilised. For a RAID5/6 lun, the optimal IO size is the one that keeps all disk heads moving sequentially and in synchronisation and doesn't require partial stripe writes (and hence RMW cycles) to occur. IOWs, its the IO alignment and size that will allow full stripe writes to be sent to the underlying device. By definition, the optimal_io_size is /always/ >= minimum_io_size. If the optimal_io_size is < minimum_io_size, then one of them is incorrectly specified. The only time this does not hold true is when the device does not set a optimal_io_size, in which case it should be zero and then gets ignored by userspace. Regardless, what still stands here is that the firmware needs fixing and that is only something broadcom can fix. Cheers, Dave. Thanks for the clarification, Dave, I shouldn't have made that mistake. New question for broadcom: "How can the preferred IO size be smaller than the minimum efficient IO size?" "I will check with engineering next week and get back to you." This is the latest reply from Broadcom support. I will keep you guys updated. Have a great weekend! Le Fri, 04 Jan 2019 21:59:11 +0000 bugzilla-daemon@bugzilla.kernel.org écrivait: > https://bugzilla.kernel.org/show_bug.cgi?id=202127 > > --- Comment #15 from daimh@umich.edu --- > I have a few back and forth with Broadcom support. He checked some > logs generated by their standard error check script and the logs > looks fine to him. > > He asked to run mkfs.ext4 on the block device, it failed with an > error message "mkfs.ext4: Size of device (0x22c97a0000 > blocks) /dev/sda too big to create a filesystem using a blocksize of > 4096." > > Then he asked me if I can create XFS and EXT4 on a smaller partition. > I fdisked the block device to two smaller partitions, one is 2T the > other is 50T. mkfs.ext4 worked on both partition, but mkfs.xfs still > reports the same error. > > Then I was told to contact you guys again. Here is his latest reply. > > "I think it is a xfs issue here. Can you go back to the developer ? > What were the changes from 4.17 to 4.19 ? > > If the problem is caused by our firmware, ext4 would have failed." > By the way does making an LV atop this device work? Then making a FS in this LV? still failed. # mkfs.xfs /dev/vg/lv &> log # head log SB stripe unit sanity check failed Metadata corruption detected at 0x5571f8b8d4a7, xfs_sb block 0x0/0x1000 libxfs_writebufr: write verifer failed on xfs_sb bno 0x0/0x1000 cache_node_purge: refcount was 1, not zero (node=0x5571f9936160) SB stripe unit sanity check failed An update is that Broadcom engineer cannot reproduce the issue with Arch Linux kernel 4.20.3-arch1-1-ARCH and mkfs.xfs version 4.19.0. The difference is his raid-60 is 7T and 2T, while mine is 556T. I upgraded both kernel and mkfs to the same version as above, the problem persists. I downgraded them to linux-4.17.14.arch1-1 and xfsprogs-4.17.0-1 again, then xfs can be created and mounted without any glitch. "An update is that Broadcom engineer cannot reproduce the issue with Arch Linux kernel 4.20.3-arch1-1-ARCH and mkfs.xfs version 4.19.0." Which issue is that? Failed mkfs, or nonsensical geometry? BTW, I hope we made it clear that you can (probably?) move forward by manually specifying sunit & swidth to override the junk values from the hardware. Thanks a lot for reminding me su&sw, I just did this and mkfs.xfs worked # /opt/MegaRAID/storcli/storcli /c0 /v0 show all | grep "Strip\|Drives\|RAID" 0/0 RAID60 Optl RW Yes RAWBD - ON 556.591 TB Strip Size = 1.0 MB Number of Drives Per Span = 19 # mkfs.xfs -d su=1m,sw=17 /dev/sda # mount /dev/sda /treehouse/watsons_lab/s1/ But I still need help from you guys. What about my other storage servers that have a huge XFS system created under older Kernel 4.17.* and xfsprogs 4.17 ? I cannot mount it under the latest ArchLinux, and those filesystem are quite full.. Will it mount with "mount -o noalign" or "mount -o sunit=$VALUE1,swidth=$VALUE2" ? mount -o noalign or mount -o sunit=,swidth doesn't work. Here is the commands. ########old kernel [root@watsons-s1 ~]# uname -a Linux watsons-s1.mbni.org 4.17.14-arch1-1-ARCH #1 SMP PREEMPT Thu Aug 9 11:56:50 UTC 2018 x86_64 GNU/Linux [root@watsons-s1 ~]# mkfs.xfs -V mkfs.xfs version 4.17.0 [root@watsons-s1 ~]# mkfs.xfs -f /dev/sda meta-data=/dev/sda isize=512 agcount=557, agsize=268434944 blks = sectsz=4096 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=0 = reflink=0 data = bsize=4096 blocks=149409103872, imaxpct=1 = sunit=256 swidth=64 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=521728, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 ############new kernel [root@watsons-s1 ~]# uname -a Linux watsons-s1.mbni.org 4.20.3-arch1-1-ARCH #1 SMP PREEMPT Wed Jan 16 22:38:58 UTC 2019 x86_64 GNU/Linux [root@watsons-s1 ~]# mkfs.xfs -V mkfs.xfs version 4.19.0 [root@watsons-s1 ~]# mount -t xfs -o noalign /dev/sda /treehouse/watsons_lab/s1/ mount: /treehouse/watsons_lab/s1: mount(2) system call failed: Structure needs cleaning. [root@watsons-s1 ~]# dmesg [ 289.750470] XFS (sda): SB stripe unit sanity check failed [ 289.750564] XFS (sda): Metadata corruption detected at xfs_sb_read_verify+0x106/0x180 [xfs], xfs_sb block 0xffffffffffffffff [ 289.752487] XFS (sda): Unmount and run xfs_repair [ 289.753980] XFS (sda): First 128 bytes of corrupted metadata buffer: [ 289.755393] 00000000: 58 46 53 42 00 00 10 00 00 00 00 22 c9 7a 00 00 XFSB.......".z.. [ 289.756787] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 289.758168] 00000020: 85 3d da cd 25 7c 48 e3 a7 56 b2 4c 8c d2 d5 35 .=..%|H..V.L...5 [ 289.759551] 00000030: 00 00 00 11 60 00 00 08 00 00 00 00 00 00 08 00 ....`........... [ 289.760939] 00000040: 00 00 00 00 00 00 08 01 00 00 00 00 00 00 08 02 ................ [ 289.762308] 00000050: 00 00 00 01 0f ff fe 00 00 00 02 2d 00 00 00 00 ...........-.... [ 289.763670] 00000060: 00 07 f6 00 bd a5 10 00 02 00 00 08 00 00 00 00 ................ [ 289.765016] 00000070: 00 00 00 00 00 00 00 00 0c 0c 09 03 1c 00 00 01 ................ [ 289.766432] XFS (sda): SB validate failed with error -117. [root@watsons-s1 ~]# mount -t xfs -o sunit=256,swidth=64 /dev/sda /treehouse/watsons_lab/s1/ mount: /treehouse/watsons_lab/s1: wrong fs type, bad option, bad superblock on /dev/sda, missing codepage or helper program, or other error. [root@watsons-s1 ~]# dmesg [ 316.159473] XFS (sda): stripe width (64) must be a multiple of the stripe unit (256) It may still fail, but you've specified your sunit * swidth incorrectly (backwards) [ 316.159473] XFS (sda): stripe width (64) must be a multiple of the stripe unit (256) From the manpage: sunit=value and swidth=value Used to specify the stripe unit and width for a RAID device or a stripe volume. "value" must be specified in 512-byte block units. These options are only relevant to filesystems that were created with non-zero data alignment parameters. The sunit and swidth parameters specified must be compati‐ ble with the existing filesystem alignment characteris‐ tics. In general, that means the only valid changes to sunit are increasing it by a power-of-2 multiple. Valid swidth values are any integer multiple of a valid sunit value. The "sunit=256,swidth=64" comes from the output of mkfs.xfs in old kernel. I actually tried 'sunit=256,swidth=256', the dmesg error is exactly the same as 'noalign' (In reply to daimh from comment #28) > The "sunit=256,swidth=64" comes from the output of mkfs.xfs in old kernel. Yes, that's the root-cause bug causing all this pain. > I actually tried 'sunit=256,swidth=256', the dmesg error is exactly the same > as 'noalign' Ok, we need to sort out a way to efficiently rewrite or remove the geometry from those old filesystems, then. It's probably going to involve some xfs_db surgery. Or, actually - I'm not sure if this is an option for you, but mounting with new and /correct/ sunit/swidth values on the older kernel should rewrite them on disk, and make them pass muster with the new kernel. Yes, this trick fixed the problem. Now the filesystem created under old kernel can be mounted under the latest ArchLinux. Thanks a million for your help! On Tue, Jan 22, 2019 at 12:49:31PM +0000, bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=202127 > > --- Comment #22 from daimh@umich.edu --- > An update is that Broadcom engineer cannot reproduce the issue with Arch > Linux > kernel 4.20.3-arch1-1-ARCH and mkfs.xfs version 4.19.0. The difference is his > raid-60 is 7T and 2T, while mine is 556T. IOWs, they didn't actually test your configuration, and so they didn't see the problem you are seeing. Which implies they've got a problem in their firmware where something overflows at a larger size than 7TB. If that were my raid card, I'd be tearing strips off the support engineer's manager by now... Cheers, Dave. Agreed. I don't think we can possibly state clearly enough that the firmware is buggy. We can help you work around it, and add some things to the utilities to catch it, but at the end of the day, your firmware /is/ buggy and your vendor doesn't seem to understand that fact. Good luck. :) (In reply to daimh from comment #22) > An update is that Broadcom engineer cannot reproduce the issue with Arch > Linux kernel 4.20.3-arch1-1-ARCH and mkfs.xfs version 4.19.0. The difference > is his raid-60 is 7T and 2T, while mine is 556T. That's unacceptable customer service. It's lazy, amateurish, and insulting. You've done their homework for them by tracking down this bug, they need to put more effort into providing an actual fix. I'd file a warranty claim regardless of whether it's currently under warranty or not, it's always been defective from the outset. I tried sending them a nastygram through corporate feedback, but that fails. Guess I'll send it by Twitter. Another update is that Broadcom engineer emailed me last night that they are investigating. By reading this discussion, I just enjoyed the great benefit from you guys! I'm just a newbie and thanks! Today another new machine had the same problem. I fixed it by upgrading the firmware. ####Here is the info before the update # /opt/MegaRAID/storcli/storcli /c0 show all |grep ^Firmware Firmware Package Build = 24.19.0-0049 Firmware Version = 4.720.00-8220 # blockdev --getiomin --getioopt /dev/sdc 1048576 262144 # mkfs -t xfs /dev/sdc meta-data=/dev/sdc isize=512 agcount=262, agsize=268434944 blks = sectsz=4096 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=0 = reflink=1 data = bsize=4096 blocks=70314098688, imaxpct=1 = sunit=256 swidth=64 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=521728, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 SB stripe unit sanity check failed Metadata corruption detected at 0x56324656db89, xfs_sb block 0x0/0x1000 libxfs_bwrite: write verifier failed on xfs_sb bno 0x0/0x1000 mkfs.xfs: Releasing dirty buffer to free list! found dirty buffer (bulk) on free list! SB stripe unit sanity check failed Metadata corruption detected at 0x56324656db89, xfs_sb block 0x0/0x1000 libxfs_bwrite: write verifier failed on xfs_sb bno 0x0/0x1000 mkfs.xfs: writing AG headers failed, err=117 #####Then I upgraded with the command below and rebooted it # /opt/MegaRAID/storcli/storcli /c0 download file=mr3316fw.rom #####Here the info after the update # /opt/MegaRAID/storcli/storcli /c0 show all |grep -i firmware Firmware Package Build = 24.22.0-0071 Firmware Version = 4.740.00-8452 # blockdev --getiomin --getioopt /dev/sdc 262144 262144 # Great, thanks for the update on the resolution. FWIW, these were in the changelogs :) SCGCQ02027889 - Cannot create or mount xfs filesystem using xfsprogs 4.19.x kernel 4.20 so presumably it was an intentional fix. :) |
Created attachment 280259 [details] output of "mkfs -t xfs -f /dev/sda" mkfs.xfs version 4.19.0 While mounting an XFS file system created under kernel 4.17.0, the error is # mount /dev/sda /treehouse/watsons_lab/s1/ mount: /treehouse/watsons_lab/s1: mount(2) system call failed: Structure needs cleaning. #dmesg [ 397.225921] XFS (sda): SB stripe unit sanity check failed [ 397.226007] XFS (sda): Metadata corruption detected at xfs_sb_read_verify+0x106/0x180 [xfs], xfs_sb block 0xffffffffffffffff [ 397.226090] XFS (sda): Unmount and run xfs_repair [ 397.226126] XFS (sda): First 128 bytes of corrupted metadata buffer: [ 397.226173] 00000000: 58 46 53 42 00 00 10 00 00 00 00 22 c9 7a 00 00 XFSB.......".z.. [ 397.226228] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 397.226283] 00000020: c3 44 e5 14 dd 09 45 69 b0 7d 3f 9b 4b 0a fa d9 .D....Ei.}?.K... [ 397.226337] 00000030: 00 00 00 11 60 00 00 08 00 00 00 00 00 00 08 00 ....`........... [ 397.226392] 00000040: 00 00 00 00 00 00 08 01 00 00 00 00 00 00 08 02 ................ [ 397.226474] 00000050: 00 00 00 01 0f ff fe 00 00 00 02 2d 00 00 00 00 ...........-.... [ 397.226530] 00000060: 00 07 f6 00 bd a5 10 00 02 00 00 08 00 00 00 00 ................ [ 397.226585] 00000070: 00 00 00 00 00 00 00 00 0c 0c 09 03 1c 00 00 01 ................ [ 397.226651] XFS (sda): SB validate failed with error -117. #mkfs -t xfs -f /dev/sda &> mkfs.log #the log file is attached.