Bug 219300
Summary: | ext4 corrupts data on a specific pendrive | ||
---|---|---|---|
Product: | File System | Reporter: | nxe9 (linuxnormaluser) |
Component: | ext4 | Assignee: | fs_ext4 (fs_ext4) |
Status: | RESOLVED INVALID | ||
Severity: | normal | CC: | tytso |
Priority: | P3 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | Subsystem: | ||
Regression: | No | Bisected commit-id: |
Description
nxe9
2024-09-22 15:47:48 UTC
> [11844.111565] Buffer I/O error on device sdb1, logical block 7533568
> EXT4-fs (sdb1): I/O error while writing superblock
Typically, such errors indicate a storage failure, not a filesystem problem.
I strongly suspect your media is broken or damaged and should not be used to store important information.
The easiest way to test it would be to use badblocks with a single pass, using the `-w Use write-mode test` option.
The defaults for -b and -c are quite low, I'd suggest:
sudo badblocks -b 4096 -c 1000 -w -s -v /dev/sdX
Note that this operation will destroy all your data and in your case that would be `/dev/sdb` Please triple check before running the command to avoid data loss. >Typically, such errors indicate a storage failure, not a filesystem problem. >I strongly suspect your media is broken or damaged and should not be used to >store important information. How can you explain the fact that I can copy tens of GB of data to the ntfs file system on different operating systems and no errors occur and data is always consistent? For me, this is a sign that something is wrong with ext4 since ntfs works without any problems on the same hardware. I've tested badblock before and there were no errors. badblocks -w -s -o error.log /dev/sdX In short, in the case of ext4 I can generate an error very quickly. In the case of ntfs, I was unable to generate it even once. Ext4 uses a block allocation algorithm which spreads the blocks used by files across the entire storage device in order to reduce file fragmentation. There are cheap thumb drives that claim to be, say, 16GB, but which only have 8GB of flash, and they rely on the fact that some Windows file systems (FAT and NTFS) allocates blocks starting at the low-numbered block numbers, and so if there is a fake/scammy USB thumb drive (the kind that you buy in the back alley of Shenzhen, or at a deap discount in the checkout line of Microcenter, or a really dodgy vendor on Amazon Marketplace at a price which is too good to be true), it might work on Windows so long as you don't actually try to store that many files on it. In any case, the console messages are very clearly I/O errors and the LBA sector number reported is a high-numbered address: 60278752. Whether this is just a failed thumbdrive, or one which is deliberately sold as a fake is unclear, but I would suggest trying to read and write to all of the sectors of the disk. Fundamentally, ext4 assumes that the storage device is valid; and if it is not valid (e.g., has I/O errors when you try to read or write to portions of the disk), that's the storage device's problem, not ext4. > and so if there is a fake/scammy USB thumb drive
AliExpress has hundreds of them.
Some are even sold as "2TB" drives when in reality you'll be lucky if they contain 16GB of disk space. Tons of reviews on YouTube as well.
Thank you for your entries. My pendrive is not a Chinese fake and I think size is not correct. At least that's what I think. Intenso is a German company, although the chips are probably imported from the Far East. Back to the topic... I don't know much about file systems, so I'm relying on you. Is it likely that the file systems are so different that a hardware bug is visible regularly on one file system but is impossible to reproduce on the other? Besides, the fact is that two pendrives of the same model have the problem, and other models, even from the same manufacturer, do not. If I could see the error on ntfs just once, I wouldn't have a problem, but so far I haven't been able to reproduce the error on ntfs even once. Today I tested ntfs again with f3 and as usual no error. Apart from that I generated test data and filled the disk completely. As usual, all fully consistent on ntfs. Freespace on ext4 according to f3write: Free space: 28.67 GB Freespace on ntfs according to f3write: Free space: 29.23 GB As you can see, I can write even more data to ntfs and it will not generate errors. I will summarize some points: - i/o errors in dmesg appear very rarely. During data corruption this error usually does not appear. - f3 tests on ext4 are negative only sometimes. - when copying my own files to ext4 I can generate data inconsistency very quickly. - badblocks doesn't show me any errors. - ntfs always works great Therefore, I am still interested in whether one file system can actually hide hardware defects (or is implemented in such a way that it is very difficult to reproduce) or maybe the other file system has some rare bug that will only become visible in the case of this hardware. For me it's not settled. 2 billion Android users use ext4 daily with zero issues. I/O errors must not appear EVER, I repeat a normally working mass storage device should NEVER produce a single one of them. In fact if I get a single IO error on any of my devices, it instantly gets wiped and thrown in the trash. You can tell a FS that certain blocks are bad but if you value your sanity you should not be using such storage. Please ask your question on either: https://unix.stackexchange.com/questions or https://superuser.com/questions/ It does not belong here. It's not at all surprising that flaky hardware might have issues that are only exposed on different surprising. Different file systems might have very different I/O patterns both in terms of spatially (what blocks get used) and temporal (how many I/O requests are issued in parallel, and how quickly) and from a I/O request type (e.g., how much if any CACHE FLUSH requests, how many if any FORCED UNIT ATTENTION -- FUA). One quick thing I'd suggest that you try is to experiment with file systems other than ext4 and ntfs. For example, what happens if you use xfs or btrfs or f2fs with your test programs? If the hardware fails with xfs or btrfs, then that would very likely put the finger of blame on the hardware being cr*p. The other thing that you can try is to run tests on the raw hardware. For example, something like this [1]to write random data to the disk, and then verify the output. The block device must be able to handle having random data written at high speeds, and when you read back the data, you must get the same data written back. Unreasonable, I know, but if the storage device fails with random writes without a file system in the mix, it's going to be hopeless once you add a file system. [1] https://github.com/axboe/fio/blob/master/examples/basic-verify.fio I will note that large companies that buy millions of dollars of hardware, whether it's for data centers use at hyperscaler cloud companies like Amazon or Microsoft, or for Flash devices used in mobile devices such as Samsung, Motorola, Google Pixel devices, etc., will spend an awful lot of time qualifying the hardware to make sure it is high quality before they buy them. And they do this using raw tests to the block device, since this eliminates the excuse from the hardware company that "oh, this must be a file system bug". If there are failures found when using storage tests against the raw block device, there is no place for the hardware vendor to hide..... But in general, as Artem said, if there are any I/O failures at all, that's a huge red flagh. That essentially *proves* that the hardware is dodgy. You can have dodgy hardware without I/O errors, but if there are I/O errors reading or writing to a valid block/sector number, then by definition the hardware is the problem. And in your case, the errors are "USB disconnect" and "unit is off-line". That should never, ever happen, and if it does, then there is a hardware problem. It could be a cabling problem; it could be a problem with the SCSI/SATA/NVME/USB controller, etc., but the file system folks will tell you that if there are *any* such problems, resolve the hardware problem before you asking the file system people to debug the problem. It's much like asking a civil egnineer to ask why the building might be design issues when it's built on top of quicksand. Buildings assume that they are built on stable ground. If the ground is not stable, then chose a different building site or fix the ground first. OK, thanks. You convinced me. @Theodore Tso: Thank you for your detailed post. As I wrote in the first post, i tried f2fs once and it also broke the data. This confirms your claims. I tried the „basic-veryfy.fio“. Unfortunately, this method is not very practical, because in the case of my pendrive, the verification time is about 60 days. After 10 hours I stopped. The progress was less than one percent. Another properly functioning pendrive would also require many days. Perhaps this method would generate an error, but it is very cumbersome. From the perspective of the average user, this is not a good situation, because you can operate on hardware that is not fully functional, not be fully aware of it and not have an easy and effective method to verify the status of your device. True, you can also buy hardware from a more reputable manufacturer. Unfortunately, there's nothing I can do about it. Well, the only thing I can do is throw this equipment in the trash. Thank you again. From the user's perspective, it means that you should stick to well-regarded hardware manufacturers, and look for reviews on the web for people who complain about lost data. Then make sure you buy from a reputable vendor, to avoid buying fakes where the vendors claims that it comes from a well-regarded hardware manufacturer, and but it's really a fake where there is only 16GB of flash to back a claimed 1TB drive, and the moment you write more than 16GB of data, it start overwriting previously written blocks. In general, even high quality storage from well-regarded companies (e.g., Samsung, WDC, etc.) are not all that expensive --- especially compared to the value of the user's time, and the value of the user's data. So trying to save money by purchasing the cheapest possible storage is just false economy. In general, if it's too good to be true.... it probably is. Finally, if Intenso is a reputable manufacturer, you should be able to file a warrantee claim and they should be able to replace it with a new storage device. If they are not willing to do that.... they probably aren't a reputable manufacturer. |