Bug 63981
Summary: | Bad: Buffer I/O errors make disk unusable | ||
---|---|---|---|
Product: | IO/Storage | Reporter: | Giuseppe Scalzi (scalg1) |
Component: | Serial ATA | Assignee: | fs_ext4 (fs_ext4) |
Status: | RESOLVED INVALID | ||
Severity: | high | CC: | tytso |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | 3.12.0-rc6 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: | dmesg of the errors causing the problem |
From the errors listed in the dmesg, looks like it is a hardware problem with the SSD, not an ext4 bug. I'd suggest doing a full backup of your disk while you still can, and try replacing the SSD.... (In reply to Theodore Tso from comment #1) > From the errors listed in the dmesg, looks like it is a hardware problem > with the SSD, not an ext4 bug. > > I'd suggest doing a full backup of your disk while you still can, and try > replacing the SSD.... That's strange because I bought the laptop two weeks ago and for one week I used windows and all worked fine. I have this problem since the first day after installing Linux. Is it possible to check if there are some hardware errors from smartctl? === START OF INFORMATION SECTION === Device Model: SAMSUNG MZNTD256HAGL-00000 Serial Number: S15ZNYAD730814 LU WWN Device Id: 5 002538 5000648f8 Firmware Version: DXT2300Q User Capacity: 256,060,514,304 bytes [256 GB] Sector Size: 512 bytes logical/physical Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 4c Local Time is: Mon Oct 28 23:47:51 2013 CET SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (53956) seconds. Offline data collection capabilities: (0x53) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. No Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 40) minutes. SCT capabilities: (0x003d) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 1 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0 9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 118 12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 143 177 Wear_Leveling_Count 0x0013 099 099 000 Pre-fail Always - 1 179 Used_Rsvd_Blk_Cnt_Tot 0x0013 100 100 010 Pre-fail Always - 0 181 Program_Fail_Cnt_Total 0x0032 100 100 010 Old_age Always - 0 182 Erase_Fail_Count_Total 0x0032 100 100 010 Old_age Always - 0 183 Runtime_Bad_Block 0x0013 100 100 010 Pre-fail Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0032 061 030 000 Old_age Always - 39 195 Hardware_ECC_Recovered 0x001a 200 200 000 Old_age Always - 0 199 UDMA_CRC_Error_Count 0x003e 100 100 000 Old_age Always - 0 235 Unknown_Attribute 0x0012 099 099 000 Old_age Always - 52 241 Total_LBAs_Written 0x0032 099 099 000 Old_age Always - 851312137 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing 255 0 65535 Read_scanning was never started Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. It's possible there is some kind of compatibility issue with the SATA driver on your Sony Viao, but the point is with errors like these: [13546.661310] ata4.00: failed command: WRITE FPDMA QUEUED [13546.661315] ata4.00: cmd 61/08:00:2f:1d:0a/00:00:00:00:00/40 tag 0 ncq 4096 out [13546.661315] res 40/00:01:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) [13546.661318] ata4.00: status: { DRDY } [13546.661319] ata4.00: failed command: WRITE FPDMA QUEUED [13546.661323] ata4.00: cmd 61/08:08:27:1d:0a/00:00:00:00:00/40 tag 1 ncq 4096 out [13546.661323] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [13546.661325] ata4.00: status: { DRDY } ... and like these: [13606.886264] EXT4-fs warning (device sda1): ext4_end_bio:316: I/O error writing to inode 2097393 (offset 0 size 0 starting block 82853) [13606.886267] Buffer I/O error on device sda1, logical block 82845 [13606.886268] sd 3:0:0:0: [sda] Unhandled error code [13606.886270] sd 3:0:0:0: [sda] [13606.886271] Result: hostbyte=0x04 driverbyte=0x00 [13606.886273] sd 3:0:0:0: [sda] CDB: [13606.886274] cdb[0]=0x2a: 2a 00 00 00 00 3f 00 00 08 00 [13606.886282] sd 3:0:0:0: [sda] Unhandled error code [13606.886283] Buffer I/O error on device sda1, logical block 0 [13606.886285] lost page write due to I/O error on sda1 [13606.886288] sd 3:0:0:0: [sda] [13606.886289] Result: hostbyte=0x04 driverbyte=0x00 [13606.886293] sd 3:0:0:0: [sda] CDB: [13606.886294] EXT4-fs error (device sda1): ext4_journal_check_start:56: [13606.886294] cdb[0]=0x2a: 2a 00 ... there's little that we can do at the ext4 level. Basically, the disk device (or the Sony Viao's SATA chipset) is refusing to talk to Linux. The Sony Viao has, historically, been notorious for using Windows-specific hardware that doesn't work well with Linux. I don't know anything about your specific model, but there have been enough problems in the past that I avoid Sony laptops like the plague if I intend to use Linux on them. It's not by accident that most Linux kernel developers tend to use Lenovo Thinkpads... BTW, I'm using a 512GB Samsung 840 PRO (2.5" SATA SSD) and an 240GB Intel 525 SSD (mSata) on my Lenovo T430s, and they both work like a charm. Hmm... I wasn't able to get detailed specs on your SAMSUNG MZNTD256HAGL-00000, but upon doing some further research, it appears to be a new-fangled M.2 PCIe interface. So it's not a mSATA nor a 2.5" SATA interface, but Something New. So whether or not this is a Linux bug, or an implementation bug in this new Samsung part (or a failure in the standardization of this new M.2 PCIe interface), I can't say, but this looks like the most likely cause is a problem with this new SSD or its new M.2 interface[1]. [1] http://en.wikipedia.org/wiki/Next_Generation_Form_Factor (In reply to Theodore Tso from comment #4) > BTW, I'm using a 512GB Samsung 840 PRO (2.5" SATA SSD) and an 240GB Intel > 525 SSD (mSata) on my Lenovo T430s, and they both work like a charm. > > Hmm... I wasn't able to get detailed specs on your SAMSUNG > MZNTD256HAGL-00000, but upon doing some further research, it appears to be a > new-fangled M.2 PCIe interface. So it's not a mSATA nor a 2.5" SATA > interface, but Something New. > > So whether or not this is a Linux bug, or an implementation bug in this new > Samsung part (or a failure in the standardization of this new M.2 PCIe > interface), I can't say, but this looks like the most likely cause is a > problem with this new SSD or its new M.2 interface[1]. > > [1] http://en.wikipedia.org/wiki/Next_Generation_Form_Factor Ok, thank you for you reply, I understand that isn't a problem related to EXT4. I noticed from the archlinux wiki of my laptop model (https://wiki.archlinux.org/index.php/Sony_Vaio_Pro_SVP-1x21) that they suggest to use this option: - When booting from USB, append libata.force=noncq to the kernel parameters to avoid problems with the SSD. Well they say "when booting from USB" but I'll try "libata.force=noncq" anyway. We will see what happens. |
Created attachment 112591 [details] dmesg of the errors causing the problem When I use my laptop, suddenly the SSD disk become unusable. The disk is mounted in read-only mode and the only way to get it work again is to reboot. During the reboot, the file system check, fixes the errors and I can use the laptop for some hours after that the problem appear again. This problem is difficult to reproduce because there are no precise steps to perform in order to cause the I/O errors showed by the attached dmesg. I had the same problem using kernel 3.11.5 and 3.10.6. I use a Sony VAIO pro (Sony Corporation SVP1321C5E/VAIO, BIOS R1040V7 09/09/2013). ============================ Information about my system: bash-4.2# cat /proc/scsi/scsi Attached devices: Host: scsi3 Channel: 00 Id: 00 Lun: 00 Vendor: ATA Model: SAMSUNG MZNTD256 Rev: DXT2 Type: Direct-Access ANSI SCSI revision: 05 ======================================== /etc/fstab /dev/sda1 / ext4 defaults 1 1 /dev/sda2 /home/ ext4 defaults 1 2 /dev/sda3 /media/hd1 ext4 defaults 1 2 #/dev/cdrom /mnt/cdrom auto noauto,owner,ro,comment=x-gvfs-show 0 0 /dev/fd0 /mnt/floppy auto noauto,owner 0 0 devpts /dev/pts devpts gid=5,mode=620 0 0 proc /proc proc defaults 0 0 tmpfs /dev/shm tmpfs defaults 0 0 tmpfs /tmp tmpfs defaults,noatime,nodiratime,mode=1777 0 0 tmpfs /var/spool tmpfs defaults,noatime,nodiratime,mode=1777 0 0 tmpfs /var/tmp tmpfs defaults,noatime,nodiratime,mode=1777 0 0 /proc/version Linux version 3.12.0-rc6 (root@darkstar) (gcc version 4.8.1 (GCC) ) #1 SMP Sun Oct 27 19:02:16 CET 2013 Attached you will find the relevant part of dmesg. Thanks for your help.