Bug 93581

Summary: 3.17..3.19 all fail with new Seagate Archive 8TB S-ATA disk (NCQ timeouts)
Product: IO/Storage Reporter: Jens (jens-bugzilla.kernel.org)
Component: Serial ATAAssignee: Tejun Heo (tj)
Status: REOPENED ---    
Severity: normal CC: alex.wellmann, anatol.pomozov, anthony, big.nemesis, br, brking, c.koke, calestyo, cbruun, cJ-kernel, crux, david.eisner, david, dev, dev, georg, hatl, info, ingolf.becker, jmaggard10, joerg.niemoeller, kernel, m+kernel-bugs, martin.petersen, mkp, oliver, piotr.oniszczuk, robert.krig+kernel_bug_tracker, russell.sim, S.Braendlin, shan, sweeneygj, szg00000, ted, tormen, towil-kernelbugs, tw.haoto, villeneuve, zerks0
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 3.19 Subsystem:
Regression: No Bisected commit-id:
Attachments: HDD info (hdparm -I)
Workaround patch for kernel 4.0+
Bisect to locate error for Seagate ST8000AS0002 archive drive
Bisect 4.1 to 4.2 to locate 2nd Seagate SMR bug
dmesg of the session where I'm experiencing problems on 3.14, ncq disabled
Logs of the session a day after Comment #53
dmesg and smartctl after letting the drive 'settle'
dmesg from boot to drive failure after rsyncing ~250 GB of data
Log of drives interfering with other disks on same controller

Description Jens 2015-02-21 14:32:13 UTC
Created attachment 167811 [details]
HDD info (hdparm -I)

I have a brand new Seagate 8TB archive disk:

 Model=ST8000AS0002-1NA17Z, FwRev=AR13, SerialNo=.....

I formatted this with LUKS + ext4 to use as an archive disk under Ubuntu 14.04 with an updated mainline kernel (3.19.0-031900-generic #201502091451). I use this instead of the stock Ubuntu kernel due to buggy Haswell PM support in everything < 3.17.

The mainboard is a MSI B85M-P45 with BIOS 10.5 first, but I also upgraded to 10.8 and tried again, no help.

hdparm and smartmontools all report no errors with the device. I also tested multiple SATA cables and power cables, which work fine with other SATA devices (one SSD and one WD HDD).

When I copy data onto this disk (using rsync or the Ubuntu desktop), after some time (~200MB to ~5GB) the copy process stops and I see this in dmesg:

[  329.994134] ata1.00: exception Emask 0x0 SAct 0x7fffffff SErr 0x0 action 0x6 frozen
[  329.994148] ata1.00: failed command: WRITE FPDMA QUEUED
[  329.994151] ata1.00: cmd 61/40:00:80:ef:db/05:00:01:00:00/40 tag 0 ncq 688128 out
[  329.994151]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[  329.994152] ata1.00: status: { DRDY }
[  329.994154] ata1.00: failed command: WRITE FPDMA QUEUED
[  329.994155] ata1.00: cmd 61/40:08:c0:f4:db/05:00:01:00:00/40 tag 1 ncq 688128 out
[  329.994155]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[  329.994157] ata1.00: status: { DRDY }
[  329.994157] ata1.00: failed command: WRITE FPDMA QUEUED
(repeated)

[  329.994254] ata1.00: cmd 61/40:f0:40:ea:db/05:00:01:00:00/40 tag 30 ncq 688128 out
[  329.994254]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[  329.994255] ata1.00: status: { DRDY }
[  329.994257] ata1: hard resetting link
[  335.356764] ata1: link is slow to respond, please be patient (ready=0)
[  340.007083] ata1: COMRESET failed (errno=-16)
[  340.007088] ata1: hard resetting link
[  345.377757] ata1: link is slow to respond, please be patient (ready=0)
[  350.028074] ata1: COMRESET failed (errno=-16)
[  350.028078] ata1: hard resetting link
[  355.394766] ata1: link is slow to respond, please be patient (ready=0)
[  366.312881] sdd: detected capacity change from 31876710400 to 0

After I see this, the disk is usually inaccessible and unusable until I hard reset (power cycle) the system. A soft reboot does not help.

With "libata.force=noncq" I get this instead:

[   43.622595] EXT4-fs (dm-4): mounted filesystem with ordered data mode. Opts: (null)
[   89.804533] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[   89.804548] ata1.00: failed command: WRITE DMA EXT
[   89.804551] ata1.00: cmd 35/00:80:00:d0:e4/00:4a:01:00:00/e0 tag 14 dma 9764864 out
[   89.804551]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[   89.804552] ata1.00: status: { DRDY }
[   89.804554] ata1: hard resetting link
[   90.236318] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[   90.883929] ata1.00: configured for UDMA/133
[   90.884028] ata1.00: device reported invalid CHS sector 0
[   90.884031] ata1: EH complete
[  173.794464] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[  173.794479] ata1.00: failed command: WRITE DMA EXT
[  173.794481] ata1.00: cmd 35/00:80:80:84:2f/00:4a:02:00:00/e0 tag 3 dma 9764864 out
[  173.794481]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[  173.794482] ata1.00: status: { DRDY }
[  173.794485] ata1: hard resetting link
[  174.226249] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  174.737999] ata1.00: configured for UDMA/133
[  174.738171] ata1.00: device reported invalid CHS sector 0
[  174.738175] ata1: EH complete
[  208.784935] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[  208.784950] ata1.00: failed command: WRITE DMA EXT
[  208.784952] ata1.00: cmd 35/00:c0:40:d5:3b/00:3a:02:00:00/e0 tag 13 dma 7700480 out
[  208.784952]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[  208.784954] ata1.00: status: { DRDY }
[  208.784956] ata1: hard resetting link
[  214.142219] ata1: link is slow to respond, please be patient (ready=0)
[  218.787931] ata1: COMRESET failed (errno=-16)
[  218.787946] ata1: hard resetting link
[  224.145236] ata1: link is slow to respond, please be patient (ready=0)
[  228.790896] ata1: COMRESET failed (errno=-16)
[  228.790912] ata1: hard resetting link
[  234.148199] ata1: link is slow to respond, please be patient (ready=0)
[  263.825359] ata1: COMRESET failed (errno=-16)
[  263.825364] ata1: limiting SATA link speed to 3.0 Gbps
[  263.825365] ata1: hard resetting link
[  268.846805] ata1: COMRESET failed (errno=-16)
[  268.846810] ata1: reset failed, giving up
[  268.846812] ata1.00: disabled
[  268.846816] ata1.00: device reported invalid CHS sector 0
[  268.846822] ata1: EH complete
[  268.846866] sd 0:0:0:0: [sda] FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[  268.846867] sd 0:0:0:0: [sda] CDB: 
[  268.846868] Write(16): 8a 00 00 00 00 00 02 3b d5 40 00 00 3a c0 00 00
[  268.846873] blk_update_request: I/O error, dev sda, sector 37475648
[  268.846909] EXT4-fs warning (device dm-3): ext4_end_bio:317: I/O error -5 writing to inode 201195635 (offset 109051904 size 4194304 starting block 4683944)
[  268.846910] Buffer I/O error on device dm-3, logical block 4683944
[  268.846913] Buffer I/O error on device dm-3, logical block 4683945
[  268.846914] Buffer I/O error on device dm-3, logical block 4683946
[  268.846914] Buffer I/O error on device dm-3, logical block 4683947
[  268.846915] Buffer I/O error on device dm-3, logical block 4683948
[  268.846916] Buffer I/O error on device dm-3, logical block 4683949
[  268.846917] Buffer I/O error on device dm-3, logical block 4683950
[  268.846918] Buffer I/O error on device dm-3, logical block 4683951
[  268.846919] Buffer I/O error on device dm-3, logical block 4683952
[  268.846920] Buffer I/O error on device dm-3, logical block 4683953
[  268.846969] EXT4-fs warning (device dm-3): ext4_end_bio:317: I/O error -5 writing to inode 201195635 (offset 109051904 size 4194304 starting block 4684112)
[  268.847011] EXT4-fs warning (device dm-3): ext4_end_bio:317: I/O error -5 writing to inode 201195635 (offset 109051904 size 4194304 starting block 4684280)
[  268.847056] EXT4-fs warning (device dm-3): ext4_end_bio:317: I/O error -5 writing to inode 201195635 (offset 109051904 size 4194304 starting block 4684448)
[  268.847096] EXT4-fs warning (device dm-3): ext4_end_bio:317: I/O error -5 writing to inode 201195635 (offset 109051904 size 4194304 starting block 4684616)
[  268.847125] EXT4-fs warning (device dm-3): ext4_end_bio:317: I/O error -5 writing to inode 201195635 (offset 109051904 size 4194304 starting block 4684784)
[  268.847142] EXT4-fs warning (device dm-3): ext4_end_bio:317: I/O error -5 writing to inode 201195635 (offset 113246208 size 4194304 starting block 4684800)
[  268.847188] EXT4-fs warning (device dm-3): ext4_end_bio:317: I/O error -5 writing to inode 201195635 (offset 113246208 size 4194304 starting block 4684968)
[  268.847230] EXT4-fs warning (device dm-3): ext4_end_bio:317: I/O error -5 writing to inode 201195635 (offset 113246208 size 4194304 starting block 4685136)
[  268.847272] EXT4-fs warning (device dm-3): ext4_end_bio:317: I/O error -5 writing to inode 201195635 (offset 113246208 size 4194304 starting block 4685304)
[  268.847431] sd 0:0:0:0: [sda] FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[  268.847432] sd 0:0:0:0: [sda] CDB: 
[  268.847433] Write(16): 8a 00 00 00 00 00 02 3c 10 00 00 00 45 40 00 00
[  268.847437] blk_update_request: I/O error, dev sda, sector 37490688
[  268.848085] sd 0:0:0:0: [sda] FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[  268.848086] sd 0:0:0:0: [sda] CDB: 
[  268.848087] Write(16): 8a 00 00 00 00 00 02 3c 55 40 00 00 45 40 00 00
[  268.848103] blk_update_request: I/O error, dev sda, sector 37508416
[  268.849382] sd 0:0:0:0: [sda] FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[  268.849384] sd 0:0:0:0: [sda] CDB: 
[  268.849385] Write(16): 8a 00 00 00 00 00 02 3c 9a 80 00 00 45 40 00 00
[  268.849390] blk_update_request: I/O error, dev sda, sector 37526144
[  268.849909] sd 0:0:0:0: [sda] FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[  268.849910] sd 0:0:0:0: [sda] CDB: 
[  268.849911] Write(16): 8a 00 00 00 00 00 02 3c df c0 00 00 04 40 00 00
[  268.849915] blk_update_request: I/O error, dev sda, sector 37543872
[  268.849951] sd 0:0:0:0: [sda] FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[  268.849952] sd 0:0:0:0: [sda] CDB: 
[  268.849952] Write(16): 8a 00 00 00 00 00 02 3c e4 00 00 00 05 40 00 00
[  268.849956] blk_update_request: I/O error, dev sda, sector 37544960
[  268.850011] sd 0:0:0:0: [sda] FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[  268.850012] sd 0:0:0:0: [sda] CDB: 
[  268.850012] Read(16): 88 00 00 00 00 02 ff 84 10 28 00 00 00 08 00 00
[  268.850016] blk_update_request: I/O error, dev sda, sector 12876779560
[  268.850023] sd 0:0:0:0: [sda] FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[  268.850024] sd 0:0:0:0: [sda] CDB: 
[  268.850024] Write(16): 8a 00 00 00 00 00 16 00 31 00 00 00 08 00 00 00
[  268.850028] blk_update_request: I/O error, dev sda, sector 369111296
[  268.850068] sd 0:0:0:0: [sda] FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[  268.850070] sd 0:0:0:0: [sda] CDB: 
[  268.850071] Write(16): 8a
[  268.850073] sd 0:0:0:0: [sda] FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[  268.850074] sd 0:0:0:0: [sda] CDB: 
[  268.850074] Write(16):
[  268.850075]  00 00 00 00 02 ff 80 10 08 00 00 00 08 00 00
[  268.850092] blk_update_request: I/O error, dev sda, sector 12876517384
[  268.850093]  8a 00 00 00 00 02 ff 80
[  268.850095] Buffer I/O error on dev dm-3, logical block 1609564161, lost async page write
[  268.850096]  10 80 00 00 00 08 00 00
[  268.850098] blk_update_request: I/O error, dev sda, sector 12876517504
[  268.850100] Buffer I/O error on dev dm-3, logical block 1609564176, lost async page write
[  268.850101] Buffer I/O error on dev dm-3, logical block 1609564192, lost async page write
[  268.850105] Buffer I/O error on dev dm-3, logical block 1609568289, lost async page write
[  268.850107] Buffer I/O error on dev dm-3, logical block 1609596954, lost async page write
[  268.850214] EXT4-fs error (device dm-3): ext4_find_entry:1289: inode #201195523: comm pool: reading directory lblock 0
[  268.850907] Buffer I/O error on dev dm-3, logical block 11, lost async page write
[  268.850911] Buffer I/O error on dev dm-3, logical block 384, lost async page write
[  268.850914] Buffer I/O error on dev dm-3, logical block 4194313, lost async page write
[  268.850916] Buffer I/O error on dev dm-3, logical block 4194314, lost async page write
[  268.850918] Buffer I/O error on dev dm-3, logical block 4194315, lost async page write
[  268.851198] Aborting journal on device dm-3-8.
[  268.851214] EXT4-fs (dm-3): Delayed block allocation failed for inode 201195635 at logical offset 44672 with max blocks 2048 with error 30
[  268.851215] EXT4-fs (dm-3): This should not happen!! Data will be lost
[  268.851215] 
[  268.851217] EXT4-fs error (device dm-3) in ext4_writepages:2395: Journal has aborted
[  268.851218] EXT4-fs (dm-3): previous I/O error to superblock detected
[  268.851238] JBD2: Error -5 detected when updating journal superblock for dm-3-8.
[  268.851491] EXT4-fs (dm-3): previous I/O error to superblock detected
[  268.851543] EXT4-fs error (device dm-3): ext4_journal_check_start:56: Detected aborted journal
[  268.851545] EXT4-fs (dm-3): Remounting filesystem read-only
[  268.851557] EXT4-fs (dm-3): previous I/O error to superblock detected
[  268.851598] EXT4-fs error (device dm-3) in ext4_da_write_end:2652: IO failure
[  268.851610] EXT4-fs (dm-3): previous I/O error to superblock detected
[  298.891785] scsi_io_completion: 31 callbacks suppressed
[  298.891799] sd 0:0:0:0: [sda] FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[  298.891800] sd 0:0:0:0: [sda] CDB: 
[  298.891801] Write(16): 8a 00 00 00 00 00 00 00 10 60 00 00 00 08 00 00
[  298.891805] blk_update_request: 31 callbacks suppressed
[  298.891806] blk_update_request: I/O error, dev sda, sector 4192
[  298.891809] buffer_io_error: 7 callbacks suppressed
[  298.891810] Buffer I/O error on dev dm-3, logical block 12, lost async page write
[  359.379609] sd 0:0:0:0: [sda] FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[  359.379621] sd 0:0:0:0: [sda] CDB: 
[  359.379622] Read(16): 88 00 00 00 00 02 ff 80 91 08 00 00 00 08 00 00
[  359.379627] blk_update_request: I/O error, dev sda, sector 12876550408
[  359.379651] EXT4-fs error (device dm-3): ext4_find_entry:1289: inode #201195523: comm rsync: reading directory lblock 0
[  359.379653] EXT4-fs (dm-3): previous I/O error to superblock detected

Sometimes I also see this:

[  202.092355] ata1: link is slow to respond, please be patient (ready=0)
[  206.740359] ata1: COMRESET failed (errno=-16)
[  206.740363] ata1: hard resetting link
[  212.100352] ata1: link is slow to respond, please be patient (ready=0)
[  216.748316] ata1: COMRESET failed (errno=-16)
[  216.748321] ata1: hard resetting link
[  222.108331] ata1: link is slow to respond, please be patient (ready=0)
[  251.800305] ata1: COMRESET failed (errno=-16)
[  251.800310] ata1: limiting SATA link speed to 3.0 Gbps
[  251.800311] ata1: hard resetting link
[  256.824303] ata1: COMRESET failed (errno=-16)
[  256.824308] ata1: reset failed, giving up
[  256.824309] ata1.00: disabled
[  256.824311] ata1.00: device reported invalid CHS sector 0
[  256.824312] ata1.00: device reported invalid CHS sector 0
[  256.824314] ata1.00: device reported invalid CHS sector 0
[  256.824315] ata1.00: device reported invalid CHS sector 0
[  256.824315] ata1.00: device reported invalid CHS sector 0
[  256.824316] ata1.00: device reported invalid CHS sector 0
[  256.824317] ata1.00: device reported invalid CHS sector 0
[  256.824318] ata1.00: device reported invalid CHS sector 0
[  256.824319] ata1.00: device reported invalid CHS sector 0
[  256.824320] ata1.00: device reported invalid CHS sector 0
[  256.824322] ata1.00: device reported invalid CHS sector 0
[  256.824324] ata1.00: device reported invalid CHS sector 0
[  256.824325] ata1.00: device reported invalid CHS sector 0
[  256.824327] ata1.00: device reported invalid CHS sector 0
[  256.824328] ata1.00: device reported invalid CHS sector 0
[  256.824330] ata1.00: device reported invalid CHS sector 0
[  256.824331] ata1.00: device reported invalid CHS sector 0
[  256.824332] ata1.00: device reported invalid CHS sector 0
[  256.824333] ata1.00: device reported invalid CHS sector 0
[  256.824335] ata1.00: device reported invalid CHS sector 0
[  256.824337] ata1.00: device reported invalid CHS sector 0
[  256.824338] ata1.00: device reported invalid CHS sector 0
[  256.824340] ata1.00: device reported invalid CHS sector 0
[  256.824341] ata1.00: device reported invalid CHS sector 0
[  256.824342] ata1.00: device reported invalid CHS sector 0
[  256.824343] ata1.00: device reported invalid CHS sector 0
[  256.824344] ata1.00: device reported invalid CHS sector 0
[  256.824345] ata1.00: device reported invalid CHS sector 0
[  256.824346] ata1.00: device reported invalid CHS sector 0
[  256.824347] ata1.00: device reported invalid CHS sector 0
[  256.824348] ata1.00: device reported invalid CHS sector 0
[  256.824357] ata1: EH complete
[  256.824374] sd 0:0:0:0: [sda] FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[  256.824376] sd 0:0:0:0: [sda] CDB: 
[  256.824376] Write(16): 8a 00 00 00 00 00 02 a2 1a 80 00 00 05 40 00 00
[  256.824381] blk_update_request: I/O error, dev sda, sector 44178048
[  256.824403] EXT4-fs warning (device dm-4): ext4_end_bio:317: I/O error -5 writing to inode 201195637 (offset 134217728 size 8388608 starting block 5521744)
[  256.824405] Buffer I/O error on device dm-4, logical block 5521744
[  256.824407] Buffer I/O error on device dm-4, logical block 5521745
[  256.824408] Buffer I/O error on device dm-4, logical block 5521746
[  256.824409] Buffer I/O error on device dm-4, logical block 5521747
[  256.824410] Buffer I/O error on device dm-4, logical block 5521748
[  256.824410] Buffer I/O error on device dm-4, logical block 5521749
[  256.824411] Buffer I/O error on device dm-4, logical block 5521750
[  256.824412] Buffer I/O error on device dm-4, logical block 5521751
[  256.824413] Buffer I/O error on device dm-4, logical block 5521752
[  256.824414] Buffer I/O error on device dm-4, logical block 5521753
[  256.824453] sd 0:0:0:0: [sda] FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[  256.824453] sd 0:0:0:0: [sda] CDB: 
[  256.824454] Write(16): 8a 00 00 00 00 00 02 a2 15 40 00 00 05 40 00 00
[  256.824458] blk_update_request: I/O error, dev sda, sector 44176704
[  256.824473] EXT4-fs warning (device dm-4): ext4_end_bio:317: I/O error -5 writing to inode 201195637 (offset 134217728 size 8388608 starting block 5521576)
[  256.824528] sd 0:0:0:0: [sda] FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[  256.824529] sd 0:0:0:0: [sda] CDB: 
[  256.824529] Write(16): 8a 00 00 00 00 00 02 a2 10 00 00 00 05 40 00 00
[  256.824533] blk_update_request: I/O error, dev sda, sector 44175360
[  256.824550] EXT4-fs warning (device dm-4): ext4_end_bio:317: I/O error -5 writing to inode 201195637 (offset 134217728 size 8388608 starting block 5521408)
[  256.824604] sd 0:0:0:0: [sda] FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[  256.824605] sd 0:0:0:0: [sda] CDB: 
[  256.824605] Write(16): 8a 00 00 00 00 00 02 a2 0f 00 00 00 01 00 00 00
[  256.824609] blk_update_request: I/O error, dev sda, sector 44175104
[  256.824613] EXT4-fs warning (device dm-4): ext4_end_bio:317: I/O error -5 writing to inode 201195637 (offset 125829120 size 8388608 starting block 5521376)
[  256.824641] sd 0:0:0:0: [sda] FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[  256.824642] sd 0:0:0:0: [sda] CDB: 
[  256.824642] Write(16): 8a 00 00 00 00 00 02 a2 09 c0 00 00 05 40 00 00
[  256.824646] blk_update_request: I/O error, dev sda, sector 44173760
[  256.824660] EXT4-fs warning (device dm-4): ext4_end_bio:317: I/O error -5 writing to inode 201195637 (offset 125829120 size 8388608 starting block 5521208)
[  256.824709] sd 0:0:0:0: [sda] FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[  256.824710] sd 0:0:0:0: [sda] CDB: 
[  256.824710] Write(16): 8a 00 00 00 00 00 02 a2 04 80 00 00 05 40 00 00
[  256.824714] blk_update_request: I/O error, dev sda, sector 44172416
[  256.824729] EXT4-fs warning (device dm-4): ext4_end_bio:317: I/O error -5 writing to inode 201195637 (offset 125829120 size 8388608 starting block 5521040)
[  256.824780] sd 0:0:0:0: [sda] FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[  256.824781] sd 0:0:0:0: [sda] CDB: 
[  256.824782] Write(16): 8a 00 00 00 00 00 02 a1 ff 40 00 00 05 40 00 00
[  256.824785] blk_update_request: I/O error, dev sda, sector 44171072
[  256.824800] EXT4-fs warning (device dm-4): ext4_end_bio:317: I/O error -5 writing to inode 201195637 (offset 125829120 size 8388608 starting block 5520872)
[  256.824851] sd 0:0:0:0: [sda] FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[  256.824851] sd 0:0:0:0: [sda] CDB: 
[  256.824852] Write(16): 8a 00 00 00 00 00 02 a1 fa 00 00 00 05 40 00 00
[  256.824856] blk_update_request: I/O error, dev sda, sector 44169728
[  256.824872] EXT4-fs warning (device dm-4): ext4_end_bio:317: I/O error -5 writing to inode 201195637 (offset 125829120 size 8388608 starting block 5520704)
[  256.824923] sd 0:0:0:0: [sda] FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[  256.824924] sd 0:0:0:0: [sda] CDB: 
[  256.824924] Write(16): 8a 00 00 00 00 00 02 a1 f4 c0 00 00 05 40 00 00
[  256.824928] blk_update_request: I/O error, dev sda, sector 44168384
[  256.824943] EXT4-fs warning (device dm-4): ext4_end_bio:317: I/O error -5 writing to inode 201195637 (offset 125829120 size 8388608 starting block 5520536)
[  256.824994] sd 0:0:0:0: [sda] FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[  256.824995] sd 0:0:0:0: [sda] CDB: 
[  256.824996] Write(16): 8a 00 00 00 00 00 02 a1 ef 80 00 00 05 40 00 00
[  256.825000] blk_update_request: I/O error, dev sda, sector 44167040
[  256.825018] EXT4-fs warning (device dm-4): ext4_end_bio:317: I/O error -5 writing to inode 201195637 (offset 125829120 size 8388608 starting block 5520368)
[  256.828553] EXT4-fs error (device dm-4) in ext4_ext_truncate:4669: IO failure
[  256.833847] Buffer I/O error on dev dm-4, logical block 0, lost sync page write
[  256.839020] Aborting journal on device dm-4-8.
[  256.839052] Buffer I/O error on dev dm-4, logical block 976781312, lost sync page write
[  256.839070] JBD2: Error -5 detected when updating journal superblock for dm-4-8.
[  272.135941] EXT4-fs (dm-4): previous I/O error to superblock detected
[  272.135976] scsi_io_completion: 84 callbacks suppressed
[  272.135979] sd 0:0:0:0: [sda] FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[  272.135980] sd 0:0:0:0: [sda] CDB: 
[  272.135981] Write(16): 8a 00 00 00 00 00 00 00 10 00 00 00 00 08 00 00
[  272.135986] blk_update_request: 84 callbacks suppressed
[  272.135987] blk_update_request: I/O error, dev sda, sector 4096
[  272.135990] Buffer I/O error on dev dm-4, logical block 0, lost sync page write
[  272.135995] EXT4-fs error (device dm-4): ext4_journal_check_start:56: Detected aborted journal
[  272.135996] EXT4-fs (dm-4): Remounting filesystem read-only
[  272.135997] EXT4-fs (dm-4): previous I/O error to superblock detected
[  272.136005] sd 0:0:0:0: [sda] FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[  272.136006] sd 0:0:0:0: [sda] CDB: 
[  272.136006] Write(16): 8a 00 00 00 00 00 00 00 10 00 00 00 00 08 00 00
[  272.136011] blk_update_request: I/O error, dev sda, sector 4096
[  272.136012] Buffer I/O error on dev dm-4, logical block 0, lost sync page write

After this happens, the disk is remounted read-only. Unmounting works (usually), but when I remount I get this:

[  375.476195] EXT4-fs (dm-4): previous I/O error to superblock detected
[  375.476273] sd 0:0:0:0: [sda] FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[  375.476274] sd 0:0:0:0: [sda] CDB: 
[  375.476275] Write(16): 8a 00 00 00 00 00 00 00 10 00 00 00 00 08 00 00
[  375.476280] blk_update_request: I/O error, dev sda, sector 4096
[  375.476285] Buffer I/O error on dev dm-4, logical block 0, lost sync page write
[  375.476290] EXT4-fs error (device dm-4): ext4_put_super:780: Couldn't clean up the journal
[  375.479299] ------------[ cut here ]------------
[  375.479305] WARNING: CPU: 2 PID: 3233 at /home/kernel/COD/linux/fs/block_dev.c:67 bdev_inode_switch_bdi+0x86/0x90()
[  375.479306] Modules linked in: snd_hda_codec_hdmi(E) intel_rapl(E) iosf_mbi(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) snd_hda_codec_realtek(E) kvm(E) snd_hda_codec_generic(E) snd_seq_midi(E) snd_seq_midi_event(E) snd_rawmidi(E) snd_hda_intel(E) snd_hda_controller(E) snd_hda_codec(E) snd_hwdep(E) serio_raw(E) snd_pcm(E) snd_seq(E) shpchp(E) mei_me(E) mei(E) lpc_ich(E) snd_seq_device(E) snd_timer(E) snd(E) soundcore(E) tpm_infineon(E) 8250_fintek(E) intel_smartconnect(E) mac_hid(E) rfcomm(E) bnep(E) bluetooth(E) parport_pc(E) ppdev(E) nls_iso8859_1(E) lp(E) parport(E) uas(E) usb_storage(E) dm_crypt(E) hid_generic(E) usbhid(E) hid(E) mxm_wmi(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) aesni_intel(E) aes_x86_64(E) lrw(E) gf128mul(E) glue_helper(E) ablk_helper(E) cryptd(E) i915(E) i2c_algo_bit(E) ahci(E) libahci(E) drm_kms_helper(E) r8169(E) mii(E) drm(E) wmi(E) video(E)
[  375.479332] CPU: 2 PID: 3233 Comm: umount Tainted: G            E  3.19.0-031900-generic #201502091451
[  375.479333] Hardware name: MSI MS-7817/CSM-B85M-E45 (MS-7817), BIOS V10.8 12/11/2014
[  375.479334]  0000000000000043 ffff88014d177d68 ffffffff817c4c00 0000000000000007
[  375.479335]  0000000000000000 ffff88014d177da8 ffffffff81076e87 0000000000000000
[  375.479337]  ffff8802146031b0 ffff880214603238 ffffffff81c66d00 ffff8801d236ac00
[  375.479338] Call Trace:
[  375.479342]  [<ffffffff817c4c00>] dump_stack+0x45/0x57
[  375.479345]  [<ffffffff81076e87>] warn_slowpath_common+0x97/0xe0
[  375.479346]  [<ffffffff81076eea>] warn_slowpath_null+0x1a/0x20
[  375.479348]  [<ffffffff8122bd26>] bdev_inode_switch_bdi+0x86/0x90
[  375.479349]  [<ffffffff8122cf81>] __blkdev_put+0x81/0x1b0
[  375.479350]  [<ffffffff8122d10b>] blkdev_put+0x5b/0x160
[  375.479353]  [<ffffffff811f706d>] kill_block_super+0x4d/0x80
[  375.479354]  [<ffffffff811f72a9>] deactivate_locked_super+0x59/0x80
[  375.479356]  [<ffffffff811f7f6e>] deactivate_super+0x4e/0x70
[  375.479358]  [<ffffffff81213a93>] cleanup_mnt+0x43/0x90
[  375.479360]  [<ffffffff81213b32>] __cleanup_mnt+0x12/0x20
[  375.479362]  [<ffffffff81093e8f>] task_work_run+0xaf/0xf0
[  375.479364]  [<ffffffff81015077>] do_notify_resume+0xc7/0xd0
[  375.479367]  [<ffffffff817d21cf>] int_signal+0x12/0x17
[  375.479368] ---[ end trace a6291ba43b98e010 ]---
[  375.485916] sd 0:0:0:0: [sda] FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[  375.485919] sd 0:0:0:0: [sda] CDB: 
[  375.485920] Read(16): 88 00 00 00 00 00 00 00 00 00 00 00 00 02 00 00
[  375.485935] blk_update_request: I/O error, dev sda, sector 0
[  377.355569] sd 0:0:0:0: [sda] FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[  377.355582] sd 0:0:0:0: [sda] CDB: 
[  377.355583] Read(16): 88 00 00 00 00 00 00 00 00 00 00 00 00 02 00 00
[  377.355588] blk_update_request: I/O error, dev sda, sector 0
[  380.211836] sd 0:0:0:0: [sda] FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[  380.211839] sd 0:0:0:0: [sda] CDB: 
[  380.211840] Read(16): 88 00 00 00 00 00 00 00 00 00 00 00 00 02 00 00
[  380.211845] blk_update_request: I/O error, dev sda, sector 0
[  380.219851] show_signal_msg: 168 callbacks suppressed
[  380.219855] gvfs-udisks2-vo[2354]: segfault at 8 ip 000000000040fb14 sp 00007fffed821a70 error 4 in gvfs-udisks2-volume-monitor[400000+24000]

After this the disk is totally inaccessible.

How can I fix this? Any help welcome.

Thank you!
Comment 1 Jens 2015-02-21 21:45:06 UTC
I now compiled linux-drm-nightly as of today (git 89271faca1 from cgit.freedesktop.org) and the situation has improved a little: when writing to the disk, I get the below output about once a minute, and a pause and (probably) a bus reset, but no more crashes and unusable disks until reboot.

[  597.912997] ata1.00: device reported invalid CHS sector 0
[  597.912999] ata1.00: device reported invalid CHS sector 0
[  597.913001] ata1.00: device reported invalid CHS sector 0
[  597.913009] ata1: EH complete
[  735.622011] ata1.00: NCQ disabled due to excessive errors
[  735.622014] ata1.00: exception Emask 0x0 SAct 0x7fffffff SErr 0x0 action 0x6 frozen
[  735.622017] ata1.00: failed command: WRITE FPDMA QUEUED
[  735.622019] ata1.00: cmd 61/c0:00:c0:1f:78/0f:00:05:00:00/40 tag 0 ncq 2064384 out
[  735.622019]          res 40/00:01:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
[  735.622021] ata1.00: status: { DRDY }
[  735.622022] ata1.00: failed command: WRITE FPDMA QUEUED
[  735.622023] ata1.00: cmd 61/00:08:80:2f:78/2b:00:05:00:00/40 tag 1 ncq 5636096 out
[  735.622023]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
...
[  735.622118] ata1.00: status: { DRDY }
[  735.622119] ata1.00: failed command: WRITE FPDMA QUEUED
[  735.622121] ata1.00: cmd 61/40:f0:80:1a:78/05:00:05:00:00/40 tag 30 ncq 688128 out
[  735.622121]          res 40/00:01:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
[  735.622122] ata1.00: status: { DRDY }
[  735.622123] ata1: hard resetting link
[  736.166005] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  736.168745] ata1.00: configured for UDMA/133
[  736.168855] ata1.00: device reported invalid CHS sector 0
[  736.168857] ata1.00: device reported invalid CHS sector 0
...


I also tried to use "libata.force=1:1.5G" to artificially slow down the wire speed and this seems to make those errors appear less often, while also slowing down the disk noticeably.

Anything else I should or can try?
Comment 2 Jens 2015-02-25 21:13:49 UTC
Closing. It was a bad disk after all - Linux was right in spitting countless errors at me :-(
Comment 3 Jens 2015-04-12 07:50:48 UTC
I have to reopen this bug. Even with 4.0.0-rc7 (freedesktop git checkout as of yesterday), after changing cabling, replacing the disk itself, changing the harddisk position inside the PC case and retrying all of the above again, I still get these timeout errors with all consequences (filesystem being corrupt, remounted read-only, etc) when copying large amounts of data.

The new disk is - according to SMART - perfectly OK and works as long as I do not copy large amounts of data (roughly for up to 30 minutes). In the syslog output, I can detect no references to hardware errors - only timeouts. This might be due to the "shingled tracks" of this disk, which require rewriting adjacent tracks after write processes, which might cause the timeouts on the OS side (but that's just a guess).

Anything else I can try? Is there a way to increase the DMA timeout values so the OS waits longer? 

Here is my hardware again:

00:1f.2 SATA controller: Intel Corporation 8 Series/C220 Series Chipset Family 6-port SATA Controller 1 [AHCI mode] (rev 05) (prog-if 01 [AHCI 1.0])
	Subsystem: Micro-Star International Co., Ltd. [MSI] Device 7817
	Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 27
	I/O ports at f0b0 [size=8]
	I/O ports at f0a0 [size=4]
	I/O ports at f090 [size=8]
	I/O ports at f080 [size=4]
	I/O ports at f060 [size=32]
	Memory at f7d1a000 (32-bit, non-prefetchable) [size=2K]
	Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
	Capabilities: [70] Power Management version 3
	Capabilities: [a8] SATA HBA v1.0
	Kernel driver in use: ahci

[    0.936755] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    0.938199] ata2.00: ATA-9: ST8000AS0002-1NA17Z, AR13, max UDMA/133
[    0.938201] ata2.00: 15628053168 sectors, multi 16: LBA48 NCQ (depth 31/32), AA


Here is the syslog output:

[ 1481.390943] ata2.00: exception Emask 0x0 SAct 0x3fffffff SErr 0x0 action 0x6 frozen
[ 1481.390957] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1481.390960] ata2.00: cmd 61/c0:00:c0:89:1d/25:00:55:00:00/40 tag 0 ncq 4947968 out
[ 1481.390960]          res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1481.390961] ata2.00: status: { DRDY }
[ 1481.390962] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1481.390964] ata2.00: cmd 61/00:08:80:af:1d/15:00:55:00:00/40 tag 1 ncq 2752512 out
[ 1481.390964]          res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1481.390965] ata2.00: status: { DRDY }
[ 1481.390966] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1481.390968] ata2.00: cmd 61/40:10:80:c4:1d/05:00:55:00:00/40 tag 2 ncq 688128 out
[ 1481.390968]          res 40/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)
[ 1481.390969] ata2.00: status: { DRDY }
[ 1481.390969] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1481.390971] ata2.00: cmd 61/80:18:c0:c9:1d/20:00:55:00:00/40 tag 3 ncq 4259840 out
[ 1481.390971]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1481.390972] ata2.00: status: { DRDY }
[ 1481.390973] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1481.390975] ata2.00: cmd 61/00:20:40:ea:1d/15:00:55:00:00/40 tag 4 ncq 2752512 out
[ 1481.390975]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1481.390975] ata2.00: status: { DRDY }
[ 1481.390976] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1481.390978] ata2.00: cmd 61/c0:28:40:ff:1d/10:00:55:00:00/40 tag 5 ncq 2195456 out
[ 1481.390978]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1481.390979] ata2.00: status: { DRDY }
[ 1481.390980] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1481.390981] ata2.00: cmd 61/c0:30:00:10:1e/0f:00:55:00:00/40 tag 6 ncq 2064384 out
[ 1481.390981]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1481.390982] ata2.00: status: { DRDY }
[ 1481.390983] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1481.390985] ata2.00: cmd 61/80:38:c0:1f:1e/1f:00:55:00:00/40 tag 7 ncq 4128768 out
[ 1481.390985]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1481.390986] ata2.00: status: { DRDY }
[ 1481.390986] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1481.390988] ata2.00: cmd 61/c0:40:40:3f:1e/10:00:55:00:00/40 tag 8 ncq 2195456 out
[ 1481.390988]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1481.390989] ata2.00: status: { DRDY }
[ 1481.390990] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1481.390991] ata2.00: cmd 61/40:48:00:50:1e/05:00:55:00:00/40 tag 9 ncq 688128 out
[ 1481.390991]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1481.390992] ata2.00: status: { DRDY }
[ 1481.390993] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1481.390995] ata2.00: cmd 61/40:50:40:55:1e/05:00:55:00:00/40 tag 10 ncq 688128 out
[ 1481.390995]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1481.390996] ata2.00: status: { DRDY }
[ 1481.390996] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1481.390998] ata2.00: cmd 61/40:58:80:5a:1e/05:00:55:00:00/40 tag 11 ncq 688128 out
[ 1481.390998]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1481.390999] ata2.00: status: { DRDY }
[ 1481.391000] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1481.391002] ata2.00: cmd 61/40:60:c0:5f:1e/05:00:55:00:00/40 tag 12 ncq 688128 out
[ 1481.391002]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1481.391002] ata2.00: status: { DRDY }
[ 1481.391003] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1481.391005] ata2.00: cmd 61/40:68:00:65:1e/05:00:55:00:00/40 tag 13 ncq 688128 out
[ 1481.391005]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1481.391006] ata2.00: status: { DRDY }
[ 1481.391007] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1481.391008] ata2.00: cmd 61/40:70:40:6a:1e/05:00:55:00:00/40 tag 14 ncq 688128 out
[ 1481.391008]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1481.391009] ata2.00: status: { DRDY }
[ 1481.391010] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1481.391012] ata2.00: cmd 61/40:78:80:6f:1e/05:00:55:00:00/40 tag 15 ncq 688128 out
[ 1481.391012]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1481.391012] ata2.00: status: { DRDY }
[ 1481.391013] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1481.391015] ata2.00: cmd 61/40:80:c0:74:1e/05:00:55:00:00/40 tag 16 ncq 688128 out
[ 1481.391015]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1481.391016] ata2.00: status: { DRDY }
[ 1481.391016] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1481.391018] ata2.00: cmd 61/40:88:00:7a:1e/05:00:55:00:00/40 tag 17 ncq 688128 out
[ 1481.391018]          res 40/00:01:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
[ 1481.391019] ata2.00: status: { DRDY }
[ 1481.391020] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1481.391021] ata2.00: cmd 61/40:90:40:7f:1e/05:00:55:00:00/40 tag 18 ncq 688128 out
[ 1481.391021]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1481.391022] ata2.00: status: { DRDY }
[ 1481.391023] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1481.391025] ata2.00: cmd 61/40:98:80:84:1e/05:00:55:00:00/40 tag 19 ncq 688128 out
[ 1481.391025]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1481.391026] ata2.00: status: { DRDY }
[ 1481.391026] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1481.391028] ata2.00: cmd 61/40:a0:c0:89:1e/06:00:55:00:00/40 tag 20 ncq 819200 out
[ 1481.391028]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1481.391029] ata2.00: status: { DRDY }
[ 1481.391030] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1481.391031] ata2.00: cmd 61/40:a8:00:90:1e/05:00:55:00:00/40 tag 21 ncq 688128 out
[ 1481.391031]          res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
[ 1481.391032] ata2.00: status: { DRDY }
[ 1481.391033] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1481.391035] ata2.00: cmd 61/80:b0:40:95:1e/0a:00:55:00:00/40 tag 22 ncq 1376256 out
[ 1481.391035]          res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1481.391036] ata2.00: status: { DRDY }
[ 1481.391036] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1481.391038] ata2.00: cmd 61/40:b8:c0:9f:1e/05:00:55:00:00/40 tag 23 ncq 688128 out
[ 1481.391038]          res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1481.391039] ata2.00: status: { DRDY }
[ 1481.391040] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1481.391042] ata2.00: cmd 61/c0:c0:00:a5:1e/0f:00:55:00:00/40 tag 24 ncq 2064384 out
[ 1481.391042]          res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1481.391043] ata2.00: status: { DRDY }
[ 1481.391043] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1481.391045] ata2.00: cmd 61/c0:c8:c0:b4:1e/0f:00:55:00:00/40 tag 25 ncq 2064384 out
[ 1481.391045]          res 40/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)
[ 1481.391046] ata2.00: status: { DRDY }
[ 1481.391047] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1481.391048] ata2.00: cmd 61/40:d0:80:c4:1e/05:00:55:00:00/40 tag 26 ncq 688128 out
[ 1481.391048]          res 40/00:01:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
[ 1481.391049] ata2.00: status: { DRDY }
[ 1481.391050] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1481.391052] ata2.00: cmd 61/40:d8:c0:c9:1e/05:00:55:00:00/40 tag 27 ncq 688128 out
[ 1481.391052]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1481.391053] ata2.00: status: { DRDY }
[ 1481.391053] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1481.391055] ata2.00: cmd 61/40:e0:00:cf:1e/06:00:55:00:00/40 tag 28 ncq 819200 out
[ 1481.391055]          res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1481.391056] ata2.00: status: { DRDY }
[ 1481.391057] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1481.391058] ata2.00: cmd 61/40:e8:40:d5:1e/05:00:55:00:00/40 tag 29 ncq 688128 out
[ 1481.391058]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1481.391059] ata2.00: status: { DRDY }
[ 1481.391061] ata2: hard resetting link
[ 1481.822885] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 1482.366486] ata2.00: configured for UDMA/133
[ 1482.366573] ata2.00: device reported invalid CHS sector 0
[ 1482.366576] ata2.00: device reported invalid CHS sector 0
[ 1482.366578] ata2.00: device reported invalid CHS sector 0
[ 1482.366581] ata2.00: device reported invalid CHS sector 0
[ 1482.366583] ata2.00: device reported invalid CHS sector 0
[ 1482.366586] ata2.00: device reported invalid CHS sector 0
[ 1482.366589] ata2.00: device reported invalid CHS sector 0
[ 1482.366591] ata2.00: device reported invalid CHS sector 0
[ 1482.366593] ata2.00: device reported invalid CHS sector 0
[ 1482.366596] ata2.00: device reported invalid CHS sector 0
[ 1482.366597] ata2.00: device reported invalid CHS sector 0
[ 1482.366600] ata2.00: device reported invalid CHS sector 0
[ 1482.366602] ata2.00: device reported invalid CHS sector 0
[ 1482.366604] ata2.00: device reported invalid CHS sector 0
[ 1482.366606] ata2.00: device reported invalid CHS sector 0
[ 1482.366609] ata2.00: device reported invalid CHS sector 0
[ 1482.366611] ata2.00: device reported invalid CHS sector 0
[ 1482.366613] ata2.00: device reported invalid CHS sector 0
[ 1482.366615] ata2.00: device reported invalid CHS sector 0
[ 1482.366617] ata2.00: device reported invalid CHS sector 0
[ 1482.366619] ata2.00: device reported invalid CHS sector 0
[ 1482.366621] ata2.00: device reported invalid CHS sector 0
[ 1482.366623] ata2.00: device reported invalid CHS sector 0
[ 1482.366626] ata2.00: device reported invalid CHS sector 0
[ 1482.366628] ata2.00: device reported invalid CHS sector 0
[ 1482.366630] ata2.00: device reported invalid CHS sector 0
[ 1482.366632] ata2.00: device reported invalid CHS sector 0
[ 1482.366634] ata2.00: device reported invalid CHS sector 0
[ 1482.366636] ata2.00: device reported invalid CHS sector 0
[ 1482.366637] ata2.00: device reported invalid CHS sector 0
[ 1482.366650] ata2: EH complete
[ 1633.454773] ata2.00: exception Emask 0x0 SAct 0x7fdfffff SErr 0x0 action 0x6 frozen
[ 1633.454788] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1633.454790] ata2.00: cmd 61/40:00:00:a5:94/05:00:55:00:00/40 tag 0 ncq 688128 out
[ 1633.454790]          res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1633.454791] ata2.00: status: { DRDY }
[ 1633.454792] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1633.454794] ata2.00: cmd 61/c0:08:40:aa:94/0f:00:55:00:00/40 tag 1 ncq 2064384 out
[ 1633.454794]          res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1633.454795] ata2.00: status: { DRDY }
[ 1633.454796] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1633.454798] ata2.00: cmd 61/80:10:00:ba:94/0a:00:55:00:00/40 tag 2 ncq 1376256 out
[ 1633.454798]          res 40/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)
[ 1633.454799] ata2.00: status: { DRDY }
[ 1633.454800] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1633.454801] ata2.00: cmd 61/40:18:80:c4:94/05:00:55:00:00/40 tag 3 ncq 688128 out
[ 1633.454801]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1633.454802] ata2.00: status: { DRDY }
[ 1633.454803] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1633.454805] ata2.00: cmd 61/40:20:c0:c9:94/05:00:55:00:00/40 tag 4 ncq 688128 out
[ 1633.454805]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1633.454805] ata2.00: status: { DRDY }
[ 1633.454806] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1633.454808] ata2.00: cmd 61/40:28:00:cf:94/06:00:55:00:00/40 tag 5 ncq 819200 out
[ 1633.454808]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1633.454809] ata2.00: status: { DRDY }
[ 1633.454810] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1633.454811] ata2.00: cmd 61/40:30:40:d5:94/05:00:55:00:00/40 tag 6 ncq 688128 out
[ 1633.454811]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1633.454812] ata2.00: status: { DRDY }
[ 1633.454813] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1633.454815] ata2.00: cmd 61/80:38:80:da:94/0a:00:55:00:00/40 tag 7 ncq 1376256 out
[ 1633.454815]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1633.454816] ata2.00: status: { DRDY }
[ 1633.454816] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1633.454818] ata2.00: cmd 61/40:40:00:e5:94/05:00:55:00:00/40 tag 8 ncq 688128 out
[ 1633.454818]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1633.454819] ata2.00: status: { DRDY }
[ 1633.454820] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1633.454821] ata2.00: cmd 61/40:48:40:ea:94/05:00:55:00:00/40 tag 9 ncq 688128 out
[ 1633.454821]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1633.454822] ata2.00: status: { DRDY }
[ 1633.454823] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1633.454825] ata2.00: cmd 61/80:50:80:ef:94/0a:00:55:00:00/40 tag 10 ncq 1376256 out
[ 1633.454825]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1633.454826] ata2.00: status: { DRDY }
[ 1633.454826] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1633.454828] ata2.00: cmd 61/40:58:00:fa:94/05:00:55:00:00/40 tag 11 ncq 688128 out
[ 1633.454828]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1633.454829] ata2.00: status: { DRDY }
[ 1633.454830] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1633.454832] ata2.00: cmd 61/40:60:40:ff:94/05:00:55:00:00/40 tag 12 ncq 688128 out
[ 1633.454832]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1633.454832] ata2.00: status: { DRDY }
[ 1633.454833] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1633.454835] ata2.00: cmd 61/40:68:80:04:95/05:00:55:00:00/40 tag 13 ncq 688128 out
[ 1633.454835]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1633.454836] ata2.00: status: { DRDY }
[ 1633.454837] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1633.454838] ata2.00: cmd 61/40:70:c0:09:95/05:00:55:00:00/40 tag 14 ncq 688128 out
[ 1633.454838]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1633.454839] ata2.00: status: { DRDY }
[ 1633.454840] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1633.454842] ata2.00: cmd 61/40:78:00:0f:95/06:00:55:00:00/40 tag 15 ncq 819200 out
[ 1633.454842]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1633.454842] ata2.00: status: { DRDY }
[ 1633.454843] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1633.454845] ata2.00: cmd 61/40:80:40:15:95/05:00:55:00:00/40 tag 16 ncq 688128 out
[ 1633.454845]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1633.454846] ata2.00: status: { DRDY }
[ 1633.454847] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1633.454848] ata2.00: cmd 61/40:88:80:1a:95/05:00:55:00:00/40 tag 17 ncq 688128 out
[ 1633.454848]          res 40/00:01:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
[ 1633.454849] ata2.00: status: { DRDY }
[ 1633.454850] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1633.454852] ata2.00: cmd 61/80:90:c0:1f:95/0a:00:55:00:00/40 tag 18 ncq 1376256 out
[ 1633.454852]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1633.454853] ata2.00: status: { DRDY }
[ 1633.454853] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1633.454855] ata2.00: cmd 61/80:98:40:2a:95/1f:00:55:00:00/40 tag 19 ncq 4128768 out
[ 1633.454855]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1633.454856] ata2.00: status: { DRDY }
[ 1633.454857] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1633.454858] ata2.00: cmd 61/40:a0:c0:49:95/05:00:55:00:00/40 tag 20 ncq 688128 out
[ 1633.454858]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1633.454859] ata2.00: status: { DRDY }
[ 1633.454860] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1633.454862] ata2.00: cmd 61/80:b0:00:05:94/2a:00:55:00:00/40 tag 22 ncq 5570560 out
[ 1633.454862]          res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1633.454863] ata2.00: status: { DRDY }
[ 1633.454863] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1633.454865] ata2.00: cmd 61/40:b8:80:2f:94/1a:00:55:00:00/40 tag 23 ncq 3440640 out
[ 1633.454865]          res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1633.454866] ata2.00: status: { DRDY }
[ 1633.454867] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1633.454869] ata2.00: cmd 61/40:c0:c0:49:94/05:00:55:00:00/40 tag 24 ncq 688128 out
[ 1633.454869]          res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1633.454869] ata2.00: status: { DRDY }
[ 1633.454870] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1633.454872] ata2.00: cmd 61/80:c8:00:4f:94/20:00:55:00:00/40 tag 25 ncq 4259840 out
[ 1633.454872]          res 40/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)
[ 1633.454873] ata2.00: status: { DRDY }
[ 1633.454874] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1633.454875] ata2.00: cmd 61/80:d0:80:6f:94/0a:00:55:00:00/40 tag 26 ncq 1376256 out
[ 1633.454875]          res 40/00:01:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
[ 1633.454876] ata2.00: status: { DRDY }
[ 1633.454877] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1633.454879] ata2.00: cmd 61/c0:d8:00:7a:94/0f:00:55:00:00/40 tag 27 ncq 2064384 out
[ 1633.454879]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1633.454880] ata2.00: status: { DRDY }
[ 1633.454880] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1633.454882] ata2.00: cmd 61/80:e0:c0:89:94/0b:00:55:00:00/40 tag 28 ncq 1507328 out
[ 1633.454882]          res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1633.454883] ata2.00: status: { DRDY }
[ 1633.454884] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1633.454885] ata2.00: cmd 61/40:e8:40:95:94/05:00:55:00:00/40 tag 29 ncq 688128 out
[ 1633.454885]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1633.454886] ata2.00: status: { DRDY }
[ 1633.454887] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1633.454889] ata2.00: cmd 61/80:f0:80:9a:94/0a:00:55:00:00/40 tag 30 ncq 1376256 out
[ 1633.454889]          res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
[ 1633.454890] ata2.00: status: { DRDY }
[ 1633.454891] ata2: hard resetting link
[ 1633.774755] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 1634.469553] ata2.00: configured for UDMA/133
[ 1634.469661] ata2.00: device reported invalid CHS sector 0
[ 1634.469669] ata2.00: device reported invalid CHS sector 0
[ 1634.469674] ata2.00: device reported invalid CHS sector 0
[ 1634.469679] ata2.00: device reported invalid CHS sector 0
[ 1634.469685] ata2.00: device reported invalid CHS sector 0
[ 1634.469689] ata2.00: device reported invalid CHS sector 0
[ 1634.469694] ata2.00: device reported invalid CHS sector 0
[ 1634.469699] ata2.00: device reported invalid CHS sector 0
[ 1634.469705] ata2.00: device reported invalid CHS sector 0
[ 1634.469709] ata2.00: device reported invalid CHS sector 0
[ 1634.469713] ata2.00: device reported invalid CHS sector 0
[ 1634.469719] ata2.00: device reported invalid CHS sector 0
[ 1634.469723] ata2.00: device reported invalid CHS sector 0
[ 1634.469728] ata2.00: device reported invalid CHS sector 0
[ 1634.469732] ata2.00: device reported invalid CHS sector 0
[ 1634.469737] ata2.00: device reported invalid CHS sector 0
[ 1634.469741] ata2.00: device reported invalid CHS sector 0
[ 1634.469746] ata2.00: device reported invalid CHS sector 0
[ 1634.469751] ata2.00: device reported invalid CHS sector 0
[ 1634.469757] ata2.00: device reported invalid CHS sector 0
[ 1634.469760] ata2.00: device reported invalid CHS sector 0
[ 1634.469766] ata2.00: device reported invalid CHS sector 0
[ 1634.469772] ata2.00: device reported invalid CHS sector 0
[ 1634.469775] ata2.00: device reported invalid CHS sector 0
[ 1634.469781] ata2.00: device reported invalid CHS sector 0
[ 1634.469785] ata2.00: device reported invalid CHS sector 0
[ 1634.469790] ata2.00: device reported invalid CHS sector 0
[ 1634.469795] ata2.00: device reported invalid CHS sector 0
[ 1634.469800] ata2.00: device reported invalid CHS sector 0
[ 1634.469805] ata2.00: device reported invalid CHS sector 0
[ 1634.469832] ata2: EH complete
[ 1708.462804] ata2.00: exception Emask 0x0 SAct 0x7fffffdf SErr 0x0 action 0x6 frozen
[ 1708.462813] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1708.462821] ata2.00: cmd 61/c0:00:40:ea:da/25:00:55:00:00/40 tag 0 ncq 4947968 out
[ 1708.462821]          res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1708.462825] ata2.00: status: { DRDY }
[ 1708.462828] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1708.462834] ata2.00: cmd 61/80:08:00:10:db/1c:00:55:00:00/40 tag 1 ncq 3735552 out
[ 1708.462834]          res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1708.462837] ata2.00: status: { DRDY }
[ 1708.462840] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1708.462846] ata2.00: cmd 61/80:10:80:2c:db/0a:00:55:00:00/40 tag 2 ncq 1376256 out
[ 1708.462846]          res 40/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)
[ 1708.462849] ata2.00: status: { DRDY }
[ 1708.462852] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1708.462858] ata2.00: cmd 61/40:18:00:37:db/05:00:55:00:00/40 tag 3 ncq 688128 out
[ 1708.462858]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1708.462860] ata2.00: status: { DRDY }
[ 1708.462863] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1708.462869] ata2.00: cmd 61/c0:20:40:3c:db/0f:00:55:00:00/40 tag 4 ncq 2064384 out
[ 1708.462869]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1708.462872] ata2.00: status: { DRDY }
[ 1708.462874] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1708.462880] ata2.00: cmd 61/40:30:40:a7:d9/33:00:55:00:00/40 tag 6 ncq 6717440 out
[ 1708.462880]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1708.462883] ata2.00: status: { DRDY }
[ 1708.462885] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1708.462891] ata2.00: cmd 61/40:38:80:da:d9/05:00:55:00:00/40 tag 7 ncq 688128 out
[ 1708.462891]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1708.462894] ata2.00: status: { DRDY }
[ 1708.462896] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1708.462902] ata2.00: cmd 61/c0:40:c0:df:d9/0f:00:55:00:00/40 tag 8 ncq 2064384 out
[ 1708.462902]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1708.462905] ata2.00: status: { DRDY }
[ 1708.462907] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1708.462913] ata2.00: cmd 61/80:48:80:ef:d9/0a:00:55:00:00/40 tag 9 ncq 1376256 out
[ 1708.462913]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1708.462916] ata2.00: status: { DRDY }
[ 1708.462918] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1708.462924] ata2.00: cmd 61/00:50:00:fa:d9/08:00:55:00:00/40 tag 10 ncq 1048576 out
[ 1708.462924]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1708.462927] ata2.00: status: { DRDY }
[ 1708.462930] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1708.462935] ata2.00: cmd 61/40:58:00:02:da/05:00:55:00:00/40 tag 11 ncq 688128 out
[ 1708.462935]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1708.462938] ata2.00: status: { DRDY }
[ 1708.462941] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1708.462946] ata2.00: cmd 61/c0:60:40:07:da/08:00:55:00:00/40 tag 12 ncq 1146880 out
[ 1708.462946]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1708.462949] ata2.00: status: { DRDY }
[ 1708.462952] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1708.462957] ata2.00: cmd 61/80:68:00:10:da/0a:00:55:00:00/40 tag 13 ncq 1376256 out
[ 1708.462957]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1708.462960] ata2.00: status: { DRDY }
[ 1708.462963] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1708.462968] ata2.00: cmd 61/40:70:80:1a:da/05:00:55:00:00/40 tag 14 ncq 688128 out
[ 1708.462968]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1708.462971] ata2.00: status: { DRDY }
[ 1708.462974] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1708.462979] ata2.00: cmd 61/40:78:c0:1f:da/05:00:55:00:00/40 tag 15 ncq 688128 out
[ 1708.462979]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1708.462982] ata2.00: status: { DRDY }
[ 1708.462985] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1708.462990] ata2.00: cmd 61/40:80:00:25:da/05:00:55:00:00/40 tag 16 ncq 688128 out
[ 1708.462990]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1708.462993] ata2.00: status: { DRDY }
[ 1708.462995] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1708.463001] ata2.00: cmd 61/40:88:40:2a:da/05:00:55:00:00/40 tag 17 ncq 688128 out
[ 1708.463001]          res 40/00:01:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
[ 1708.463004] ata2.00: status: { DRDY }
[ 1708.463007] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1708.463012] ata2.00: cmd 61/40:90:80:2f:da/05:00:55:00:00/40 tag 18 ncq 688128 out
[ 1708.463012]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1708.463015] ata2.00: status: { DRDY }
[ 1708.463018] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1708.463023] ata2.00: cmd 61/40:98:c0:34:da/05:00:55:00:00/40 tag 19 ncq 688128 out
[ 1708.463023]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1708.463026] ata2.00: status: { DRDY }
[ 1708.463028] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1708.463034] ata2.00: cmd 61/80:a0:00:3a:da/0a:00:55:00:00/40 tag 20 ncq 1376256 out
[ 1708.463034]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1708.463037] ata2.00: status: { DRDY }
[ 1708.463039] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1708.463045] ata2.00: cmd 61/40:a8:80:44:da/05:00:55:00:00/40 tag 21 ncq 688128 out
[ 1708.463045]          res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
[ 1708.463048] ata2.00: status: { DRDY }
[ 1708.463050] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1708.463056] ata2.00: cmd 61/40:b0:c0:49:da/05:00:55:00:00/40 tag 22 ncq 688128 out
[ 1708.463056]          res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1708.463059] ata2.00: status: { DRDY }
[ 1708.463061] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1708.463067] ata2.00: cmd 61/00:b8:00:4f:da/01:00:55:00:00/40 tag 23 ncq 131072 out
[ 1708.463067]          res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1708.463070] ata2.00: status: { DRDY }
[ 1708.463072] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1708.463078] ata2.00: cmd 61/40:c0:00:50:da/05:00:55:00:00/40 tag 24 ncq 688128 out
[ 1708.463078]          res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1708.463081] ata2.00: status: { DRDY }
[ 1708.463083] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1708.463089] ata2.00: cmd 61/40:c8:40:55:da/05:00:55:00:00/40 tag 25 ncq 688128 out
[ 1708.463089]          res 40/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)
[ 1708.463092] ata2.00: status: { DRDY }
[ 1708.463094] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1708.463100] ata2.00: cmd 61/c0:d0:80:5a:da/0c:00:55:00:00/40 tag 26 ncq 1671168 out
[ 1708.463100]          res 40/00:01:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
[ 1708.463103] ata2.00: status: { DRDY }
[ 1708.463105] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1708.463111] ata2.00: cmd 61/80:d8:40:67:da/1f:00:55:00:00/40 tag 27 ncq 4128768 out
[ 1708.463111]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1708.463114] ata2.00: status: { DRDY }
[ 1708.463116] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1708.463122] ata2.00: cmd 61/80:e0:c0:86:da/0e:00:55:00:00/40 tag 28 ncq 1900544 out
[ 1708.463122]          res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1708.463125] ata2.00: status: { DRDY }
[ 1708.463127] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1708.463133] ata2.00: cmd 61/c0:e8:40:95:da/24:00:55:00:00/40 tag 29 ncq 4816896 out
[ 1708.463133]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1708.463136] ata2.00: status: { DRDY }
[ 1708.463138] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1708.463144] ata2.00: cmd 61/40:f0:00:ba:da/30:00:55:00:00/40 tag 30 ncq 6324224 out
[ 1708.463144]          res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
[ 1708.463147] ata2.00: status: { DRDY }
[ 1708.463152] ata2: hard resetting link
[ 1708.894676] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 1709.820112] ata2.00: configured for UDMA/133
[ 1709.820218] ata2.00: device reported invalid CHS sector 0
[ 1709.820221] ata2.00: device reported invalid CHS sector 0
[ 1709.820223] ata2.00: device reported invalid CHS sector 0
[ 1709.820225] ata2.00: device reported invalid CHS sector 0
[ 1709.820226] ata2.00: device reported invalid CHS sector 0
[ 1709.820228] ata2.00: device reported invalid CHS sector 0
[ 1709.820229] ata2.00: device reported invalid CHS sector 0
[ 1709.820231] ata2.00: device reported invalid CHS sector 0
[ 1709.820233] ata2.00: device reported invalid CHS sector 0
[ 1709.820234] ata2.00: device reported invalid CHS sector 0
[ 1709.820235] ata2.00: device reported invalid CHS sector 0
[ 1709.820237] ata2.00: device reported invalid CHS sector 0
[ 1709.820238] ata2.00: device reported invalid CHS sector 0
[ 1709.820239] ata2.00: device reported invalid CHS sector 0
[ 1709.820240] ata2.00: device reported invalid CHS sector 0
[ 1709.820241] ata2.00: device reported invalid CHS sector 0
[ 1709.820242] ata2.00: device reported invalid CHS sector 0
[ 1709.820244] ata2.00: device reported invalid CHS sector 0
[ 1709.820245] ata2.00: device reported invalid CHS sector 0
[ 1709.820246] ata2.00: device reported invalid CHS sector 0
[ 1709.820247] ata2.00: device reported invalid CHS sector 0
[ 1709.820249] ata2.00: device reported invalid CHS sector 0
[ 1709.820250] ata2.00: device reported invalid CHS sector 0
[ 1709.820251] ata2.00: device reported invalid CHS sector 0
[ 1709.820252] ata2.00: device reported invalid CHS sector 0
[ 1709.820254] ata2.00: device reported invalid CHS sector 0
[ 1709.820255] ata2.00: device reported invalid CHS sector 0
[ 1709.820256] ata2.00: device reported invalid CHS sector 0
[ 1709.820258] ata2.00: device reported invalid CHS sector 0
[ 1709.820259] ata2.00: device reported invalid CHS sector 0
[ 1709.820268] ata2: EH complete
[ 1762.478653] ata2.00: NCQ disabled due to excessive errors
[ 1762.478667] ata2.00: exception Emask 0x0 SAct 0x7fff7fff SErr 0x0 action 0x6 frozen
[ 1762.478669] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1762.478672] ata2.00: cmd 61/80:00:40:6a:f6/0a:00:55:00:00/40 tag 0 ncq 1376256 out
[ 1762.478672]          res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1762.478673] ata2.00: status: { DRDY }
[ 1762.478674] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1762.478676] ata2.00: cmd 61/40:08:c0:74:f6/05:00:55:00:00/40 tag 1 ncq 688128 out
[ 1762.478676]          res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1762.478677] ata2.00: status: { DRDY }
[ 1762.478678] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1762.478680] ata2.00: cmd 61/40:10:00:7a:f6/1b:00:55:00:00/40 tag 2 ncq 3571712 out
[ 1762.478680]          res 40/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)
[ 1762.478681] ata2.00: status: { DRDY }
[ 1762.478682] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1762.478684] ata2.00: cmd 61/80:18:40:95:f6/0a:00:55:00:00/40 tag 3 ncq 1376256 out
[ 1762.478684]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1762.478685] ata2.00: status: { DRDY }
[ 1762.478685] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1762.478687] ata2.00: cmd 61/80:20:c0:9f:f6/0a:00:55:00:00/40 tag 4 ncq 1376256 out
[ 1762.478687]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1762.478688] ata2.00: status: { DRDY }
[ 1762.478689] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1762.478691] ata2.00: cmd 61/40:28:40:aa:f6/0e:00:55:00:00/40 tag 5 ncq 1867776 out
[ 1762.478691]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1762.478692] ata2.00: status: { DRDY }
[ 1762.478693] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1762.478695] ata2.00: cmd 61/80:30:80:b8:f6/17:00:55:00:00/40 tag 6 ncq 3080192 out
[ 1762.478695]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1762.478695] ata2.00: status: { DRDY }
[ 1762.478696] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1762.478698] ata2.00: cmd 61/c0:38:00:d0:f6/24:00:55:00:00/40 tag 7 ncq 4816896 out
[ 1762.478698]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1762.478699] ata2.00: status: { DRDY }
[ 1762.478700] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1762.478702] ata2.00: cmd 61/80:40:c0:f4:f6/20:00:55:00:00/40 tag 8 ncq 4259840 out
[ 1762.478702]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1762.478703] ata2.00: status: { DRDY }
[ 1762.478704] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1762.478705] ata2.00: cmd 61/40:48:40:15:f7/1a:00:55:00:00/40 tag 9 ncq 3440640 out
[ 1762.478705]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1762.478706] ata2.00: status: { DRDY }
[ 1762.478707] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1762.478709] ata2.00: cmd 61/00:50:80:2f:f7/15:00:55:00:00/40 tag 10 ncq 2752512 out
[ 1762.478709]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1762.478710] ata2.00: status: { DRDY }
[ 1762.478711] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1762.478713] ata2.00: cmd 61/00:58:80:44:f7/16:00:55:00:00/40 tag 11 ncq 2883584 out
[ 1762.478713]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1762.478714] ata2.00: status: { DRDY }
[ 1762.478715] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1762.478716] ata2.00: cmd 61/c0:60:80:5a:f7/18:00:55:00:00/40 tag 12 ncq 3244032 out
[ 1762.478716]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1762.478717] ata2.00: status: { DRDY }
[ 1762.478718] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1762.478720] ata2.00: cmd 61/80:68:40:73:f7/0a:00:55:00:00/40 tag 13 ncq 1376256 out
[ 1762.478720]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1762.478721] ata2.00: status: { DRDY }
[ 1762.478722] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1762.478724] ata2.00: cmd 61/80:70:c0:7d:f7/0a:00:55:00:00/40 tag 14 ncq 1376256 out
[ 1762.478724]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1762.478725] ata2.00: status: { DRDY }
[ 1762.478725] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1762.478727] ata2.00: cmd 61/c0:80:40:bf:f5/2e:00:55:00:00/40 tag 16 ncq 6127616 out
[ 1762.478727]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1762.478728] ata2.00: status: { DRDY }
[ 1762.478729] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1762.478731] ata2.00: cmd 61/40:88:00:ee:f5/1a:00:55:00:00/40 tag 17 ncq 3440640 out
[ 1762.478731]          res 40/00:01:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
[ 1762.478732] ata2.00: status: { DRDY }
[ 1762.478733] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1762.478735] ata2.00: cmd 61/c0:90:40:08:f6/07:00:55:00:00/40 tag 18 ncq 1015808 out
[ 1762.478735]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1762.478736] ata2.00: status: { DRDY }
[ 1762.478737] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1762.478738] ata2.00: cmd 61/40:98:00:10:f6/05:00:55:00:00/40 tag 19 ncq 688128 out
[ 1762.478738]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1762.478739] ata2.00: status: { DRDY }
[ 1762.478740] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1762.478742] ata2.00: cmd 61/40:a0:40:15:f6/05:00:55:00:00/40 tag 20 ncq 688128 out
[ 1762.478742]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1762.478743] ata2.00: status: { DRDY }
[ 1762.478744] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1762.478746] ata2.00: cmd 61/80:a8:80:1a:f6/0a:00:55:00:00/40 tag 21 ncq 1376256 out
[ 1762.478746]          res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
[ 1762.478747] ata2.00: status: { DRDY }
[ 1762.478747] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1762.478749] ata2.00: cmd 61/80:b0:00:25:f6/0a:00:55:00:00/40 tag 22 ncq 1376256 out
[ 1762.478749]          res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1762.478750] ata2.00: status: { DRDY }
[ 1762.478751] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1762.478753] ata2.00: cmd 61/80:b8:80:2f:f6/0a:00:55:00:00/40 tag 23 ncq 1376256 out
[ 1762.478753]          res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1762.478754] ata2.00: status: { DRDY }
[ 1762.478755] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1762.478757] ata2.00: cmd 61/c0:c0:00:3a:f6/0f:00:55:00:00/40 tag 24 ncq 2064384 out
[ 1762.478757]          res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1762.478758] ata2.00: status: { DRDY }
[ 1762.478758] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1762.478760] ata2.00: cmd 61/40:c8:c0:49:f6/06:00:55:00:00/40 tag 25 ncq 819200 out
[ 1762.478760]          res 40/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)
[ 1762.478761] ata2.00: status: { DRDY }
[ 1762.478762] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1762.478764] ata2.00: cmd 61/40:d0:00:50:f6/05:00:55:00:00/40 tag 26 ncq 688128 out
[ 1762.478764]          res 40/00:01:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
[ 1762.478765] ata2.00: status: { DRDY }
[ 1762.478766] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1762.478768] ata2.00: cmd 61/40:d8:40:55:f6/05:00:55:00:00/40 tag 27 ncq 688128 out
[ 1762.478768]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1762.478769] ata2.00: status: { DRDY }
[ 1762.478769] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1762.478771] ata2.00: cmd 61/40:e0:80:5a:f6/05:00:55:00:00/40 tag 28 ncq 688128 out
[ 1762.478771]          res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1762.478772] ata2.00: status: { DRDY }
[ 1762.478773] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1762.478775] ata2.00: cmd 61/40:e8:c0:5f:f6/05:00:55:00:00/40 tag 29 ncq 688128 out
[ 1762.478775]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1762.478776] ata2.00: status: { DRDY }
[ 1762.478777] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1762.478778] ata2.00: cmd 61/40:f0:00:65:f6/05:00:55:00:00/40 tag 30 ncq 688128 out
[ 1762.478778]          res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
[ 1762.478779] ata2.00: status: { DRDY }
[ 1762.478781] ata2: hard resetting link
[ 1763.022682] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 1763.934823] ata2.00: configured for UDMA/133
[ 1763.934936] ata2.00: device reported invalid CHS sector 0
[ 1763.934938] ata2.00: device reported invalid CHS sector 0
[ 1763.934940] ata2.00: device reported invalid CHS sector 0
[ 1763.934941] ata2.00: device reported invalid CHS sector 0
[ 1763.934943] ata2.00: device reported invalid CHS sector 0
[ 1763.934944] ata2.00: device reported invalid CHS sector 0
[ 1763.934946] ata2.00: device reported invalid CHS sector 0
[ 1763.934947] ata2.00: device reported invalid CHS sector 0
[ 1763.934949] ata2.00: device reported invalid CHS sector 0
[ 1763.934951] ata2.00: device reported invalid CHS sector 0
[ 1763.934953] ata2.00: device reported invalid CHS sector 0
[ 1763.934954] ata2.00: device reported invalid CHS sector 0
[ 1763.934956] ata2.00: device reported invalid CHS sector 0
[ 1763.934957] ata2.00: device reported invalid CHS sector 0
[ 1763.934959] ata2.00: device reported invalid CHS sector 0
[ 1763.934960] ata2.00: device reported invalid CHS sector 0
[ 1763.934962] ata2.00: device reported invalid CHS sector 0
[ 1763.934963] ata2.00: device reported invalid CHS sector 0
[ 1763.934965] ata2.00: device reported invalid CHS sector 0
[ 1763.934966] ata2.00: device reported invalid CHS sector 0
[ 1763.934967] ata2.00: device reported invalid CHS sector 0
[ 1763.934969] ata2.00: device reported invalid CHS sector 0
[ 1763.934970] ata2.00: device reported invalid CHS sector 0
[ 1763.934971] ata2.00: device reported invalid CHS sector 0
[ 1763.934973] ata2.00: device reported invalid CHS sector 0
[ 1763.934974] ata2.00: device reported invalid CHS sector 0
[ 1763.934975] ata2.00: device reported invalid CHS sector 0
[ 1763.934976] ata2.00: device reported invalid CHS sector 0
[ 1763.934978] ata2.00: device reported invalid CHS sector 0
[ 1763.934979] ata2.00: device reported invalid CHS sector 0
[ 1763.934987] ata2: EH complete
[ 1827.374614] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 1827.374628] ata2.00: failed command: WRITE DMA EXT
[ 1827.374631] ata2.00: cmd 35/00:c0:40:d5:19/00:28:56:00:00/e0 tag 29 dma 5341184 out
[ 1827.374631]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1827.374632] ata2.00: status: { DRDY }
[ 1827.374634] ata2: hard resetting link
[ 1827.806569] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 1827.854797] ata2.00: configured for UDMA/133
[ 1827.854910] ata2.00: device reported invalid CHS sector 0
[ 1827.854914] ata2: EH complete
[ 1860.398557] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 1860.398565] ata2.00: failed command: WRITE DMA EXT
[ 1860.398572] ata2.00: cmd 35/00:80:80:b0:21/00:34:56:00:00/e0 tag 30 dma 6881280 out
[ 1860.398572]          res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
[ 1860.398574] ata2.00: status: { DRDY }
[ 1860.398578] ata2: hard resetting link
[ 1860.718530] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 1861.560847] ata2.00: configured for UDMA/133
[ 1861.560958] ata2.00: device reported invalid CHS sector 0
[ 1861.560971] ata2: EH complete
[ 1892.398591] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 1892.398600] ata2.00: failed command: WRITE DMA EXT
[ 1892.398608] ata2.00: cmd 35/00:00:40:c0:22/00:2a:56:00:00/e0 tag 11 dma 5505024 out
[ 1892.398608]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1892.398612] ata2.00: status: { DRDY }
[ 1892.398617] ata2: hard resetting link
[ 1897.758499] ata2: link is slow to respond, please be patient (ready=0)
[ 1902.406499] ata2: COMRESET failed (errno=-16)
[ 1902.406506] ata2: hard resetting link
[ 1907.766490] ata2: link is slow to respond, please be patient (ready=0)
[ 1912.414544] ata2: COMRESET failed (errno=-16)
[ 1912.414556] ata2: hard resetting link
[ 1917.774496] ata2: link is slow to respond, please be patient (ready=0)
[ 1947.454483] ata2: COMRESET failed (errno=-16)
[ 1947.454495] ata2: limiting SATA link speed to 3.0 Gbps
[ 1947.454500] ata2: hard resetting link
[ 1952.478482] ata2: COMRESET failed (errno=-16)
[ 1952.478493] ata2: reset failed, giving up
[ 1952.478498] ata2.00: disabled
[ 1952.478508] ata2.00: device reported invalid CHS sector 0
[ 1952.478522] ata2: EH complete
[ 1952.479208] sd 1:0:0:0: [sda] tag#13 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[ 1952.479216] sd 1:0:0:0: [sda] tag#13 CDB: Write(16) 8a 00 00 00 00 00 56 22 c0 40 00 00 2a 00 00 00
[ 1952.479220] blk_update_request: I/O error, dev sda, sector 1445118016
[ 1952.479266] EXT4-fs warning (device dm-3): ext4_end_bio:317: I/O error -5 writing to inode 230687853 (offset 380370944 size 5505024 starting block 180639240)
[ 1952.479271] Buffer I/O error on device dm-3, logical block 180639240
[ 1952.479277] Buffer I/O error on device dm-3, logical block 180639241
[ 1952.479281] Buffer I/O error on device dm-3, logical block 180639242
[ 1952.479284] Buffer I/O error on device dm-3, logical block 180639243
[ 1952.479287] Buffer I/O error on device dm-3, logical block 180639244
[ 1952.479290] Buffer I/O error on device dm-3, logical block 180639245
[ 1952.479293] Buffer I/O error on device dm-3, logical block 180639246
[ 1952.479296] Buffer I/O error on device dm-3, logical block 180639247
[ 1952.479300] Buffer I/O error on device dm-3, logical block 180639248
[ 1952.479303] Buffer I/O error on device dm-3, logical block 180639249
[ 1952.479463] EXT4-fs warning (device dm-3): ext4_end_bio:317: I/O error -5 writing to inode 230687853 (offset 380370944 size 5505024 starting block 180639408)
[ 1952.479629] EXT4-fs warning (device dm-3): ext4_end_bio:317: I/O error -5 writing to inode 230687853 (offset 380370944 size 5505024 starting block 180639576)
[ 1952.479796] EXT4-fs warning (device dm-3): ext4_end_bio:317: I/O error -5 writing to inode 230687853 (offset 385875968 size 8388608 starting block 180639744)
[ 1952.479953] EXT4-fs warning (device dm-3): ext4_end_bio:317: I/O error -5 writing to inode 230687853 (offset 385875968 size 8388608 starting block 180639912)
[ 1952.480116] EXT4-fs warning (device dm-3): ext4_end_bio:317: I/O error -5 writing to inode 230687853 (offset 385875968 size 8388608 starting block 180640080)
[ 1952.480271] EXT4-fs warning (device dm-3): ext4_end_bio:317: I/O error -5 writing to inode 230687853 (offset 385875968 size 8388608 starting block 180640248)
[ 1952.480435] EXT4-fs warning (device dm-3): ext4_end_bio:317: I/O error -5 writing to inode 230687853 (offset 385875968 size 8388608 starting block 180640416)
[ 1952.480588] sd 1:0:0:0: [sda] tag#14 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[ 1952.480593] sd 1:0:0:0: [sda] tag#14 CDB: Write(16) 8a 00 00 00 00 00 56 22 ea 40 00 00 05 40 00 00
[ 1952.480596] blk_update_request: I/O error, dev sda, sector 1445128768
[ 1952.480643] EXT4-fs warning (device dm-3): ext4_end_bio:317: I/O error -5 writing to inode 230687853 (offset 385875968 size 8388608 starting block 180640584)
[ 1952.480837] sd 1:0:0:0: [sda] tag#15 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[ 1952.480842] sd 1:0:0:0: [sda] tag#15 CDB: Write(16) 8a 00 00 00 00 00 56 22 ef 80 00 00 2b 00 00 00
[ 1952.480845] blk_update_request: I/O error, dev sda, sector 1445130112
[ 1952.480895] EXT4-fs warning (device dm-3): ext4_end_bio:317: I/O error -5 writing to inode 230687853 (offset 385875968 size 8388608 starting block 180640752)
[ 1952.482184] sd 1:0:0:0: [sda] tag#16 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[ 1952.482190] sd 1:0:0:0: [sda] tag#16 CDB: Write(16) 8a 00 00 00 00 00 56 23 1a 80 00 00 05 40 00 00
[ 1952.482193] blk_update_request: I/O error, dev sda, sector 1445141120
[ 1952.482417] sd 1:0:0:0: [sda] tag#17 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[ 1952.482422] sd 1:0:0:0: [sda] tag#17 CDB: Write(16) 8a 00 00 00 00 00 56 23 1f c0 00 00 10 c0 00 00
[ 1952.482426] blk_update_request: I/O error, dev sda, sector 1445142464
[ 1952.483019] sd 1:0:0:0: [sda] tag#18 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[ 1952.483024] sd 1:0:0:0: [sda] tag#18 CDB: Write(16) 8a 00 00 00 00 00 56 23 30 80 00 00 24 c0 00 00
[ 1952.483027] blk_update_request: I/O error, dev sda, sector 1445146752
[ 1952.484124] sd 1:0:0:0: [sda] tag#19 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[ 1952.484129] sd 1:0:0:0: [sda] tag#19 CDB: Write(16) 8a 00 00 00 00 00 56 23 55 40 00 00 0a 80 00 00
[ 1952.484132] blk_update_request: I/O error, dev sda, sector 1445156160
[ 1952.484479] sd 1:0:0:0: [sda] tag#20 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[ 1952.484484] sd 1:0:0:0: [sda] tag#20 CDB: Write(16) 8a 00 00 00 00 00 56 23 5f c0 00 00 05 40 00 00
[ 1952.484486] blk_update_request: I/O error, dev sda, sector 1445158848
[ 1952.484679] sd 1:0:0:0: [sda] tag#21 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[ 1952.484683] sd 1:0:0:0: [sda] tag#21 CDB: Write(16) 8a 00 00 00 00 00 56 23 65 00 00 00 0a 80 00 00
[ 1952.484686] blk_update_request: I/O error, dev sda, sector 1445160192
[ 1952.485039] sd 1:0:0:0: [sda] tag#22 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[ 1952.485043] sd 1:0:0:0: [sda] tag#22 CDB: Write(16) 8a 00 00 00 00 00 56 23 6f 80 00 00 0f c0 00 00
[ 1952.485046] blk_update_request: I/O error, dev sda, sector 1445162880
[ 1952.504835] Buffer I/O error on dev dm-3, logical block 1845493761, lost async page write
[ 1952.504846] Buffer I/O error on dev dm-3, logical block 1845559025, lost async page write
[ 1952.504854] Buffer I/O error on dev dm-3, logical block 180355075, lost async page write
[ 1952.504859] Buffer I/O error on dev dm-3, logical block 180355076, lost async page write
[ 1952.504863] Aborting journal on device dm-3-8.
[ 1952.504865] Buffer I/O error on dev dm-3, logical block 180355077, lost async page write
[ 1952.504869] Buffer I/O error on dev dm-3, logical block 180355078, lost async page write
[ 1952.504973] Buffer I/O error on dev dm-3, logical block 976781312, lost sync page write
[ 1952.505004] JBD2: Error -5 detected when updating journal superblock for dm-3-8.
[ 1952.505634] EXT4-fs error (device dm-3) in ext4_reserve_inode_write:4837: Journal has aborted
[ 1952.506684] Buffer I/O error on dev dm-3, logical block 0, lost sync page write
[ 1952.506952] EXT4-fs error (device dm-3): mpage_map_and_submit_extent:2128: comm kworker/u8:0: Failed to mark inode 230687853 dirty
[ 1952.506957] EXT4-fs (dm-3): previous I/O error to superblock detected
[ 1952.507076] Buffer I/O error on dev dm-3, logical block 0, lost sync page write
[ 1952.507097] EXT4-fs error (device dm-3) in ext4_writepages:2419: Journal has aborted
[ 1952.507100] EXT4-fs (dm-3): previous I/O error to superblock detected
[ 1952.507164] Buffer I/O error on dev dm-3, logical block 0, lost sync page write
[ 1952.507319] EXT4-fs (dm-3): previous I/O error to superblock detected
[ 1952.507408] EXT4-fs error (device dm-3): ext4_journal_check_start:56: Detected aborted journal
[ 1952.507414] EXT4-fs (dm-3): Remounting filesystem read-only
[ 1952.507417] EXT4-fs (dm-3): previous I/O error to superblock detected
[ 1952.507507] EXT4-fs (dm-3): ext4_writepages: jbd2_start: 1024 pages, ino 230687854; err -30

Thank you!
Comment 4 Jens 2015-04-12 10:52:34 UTC
Update: error also occurs when increasing timeout in /sys/block/sda/device/timeout to 120 (seconds). Will try 240s next. (But is this a solution?)
Comment 5 Jens 2015-04-12 11:19:39 UTC
Increasing the timeout to 240s brings the following error after about 15min of disk write activity:

[ 1080.132256] INFO: task jbd2/dm-3-8:2557 blocked for more than 120 seconds.
[ 1080.132265]       Tainted: G           OE   4.0.0-rc7+ #7
[ 1080.132268] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1080.132271] jbd2/dm-3-8     D ffff880213f43a98     0  2557      2 0x00000000
[ 1080.132280]  ffff880213f43a98 ffffffff81c164e0 ffff880210d3a800 0000000000000000
[ 1080.132287]  ffff880213f43fd8 0000000000000000 7fffffffffffffff 0000000000000002
[ 1080.132292]  ffffffff817b57a0 ffff880213f43ab8 ffffffff817b4f77 ffff880213f43ac8

and

[ 1175.528192] ata2.00: exception Emask 0x0 SAct 0x7ffeffff SErr 0x0 action 0x6 frozen
[ 1175.528198] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1175.528202] ata2.00: cmd 61/c0:00:c0:34:cf/0f:00:74:00:00/40 tag 0 ncq 2064384 out
[ 1175.528202]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)

repeated with "tag 1" .. "tag 31" 

[ 1175.528383] ata2: hard resetting link
[ 1175.964132] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 1176.748041] ata2.00: configured for UDMA/133
[ 1176.748138] ata2.00: device reported invalid CHS sector 0
[ 1176.748142] ata2.00: device reported invalid CHS sector 0

When write activity pauses, the disk LED is constantly lit and the disk sounds as if it were "doing something", so the pauses are apparently OK. But Linux doesn't think so and eventually times out.
Comment 6 Jens 2015-04-18 18:54:34 UTC
According to this article http://www.storagereview.com/seagate_archive_hdd_review_8tb, the disk seems to work as designed - and the OS should be able to cope with long(er) timeouts because the disk needs time to perform an internal garbage collection after large writes, similar to TRIM on SSDs. However, in their tests the write performance was bad but writes never completely stopped so there were no timeouts (as far as I can see) like in my case.

I need to decide within the next week whether I keep this disk or exchange it for  a smaller non-shingled disk at a slightly higher price (e.g. WD Green 6TB): is this a pure Linux issue, should I keep the disk and wait for Linux to support it, or are chances that I got another bad disk and I should give up on Seagate (again)?

Please reply.

Thank you!
Comment 7 Jens 2015-04-19 20:20:07 UTC
See also http://serverfault.com/questions/682061.
Comment 8 Christian 2015-05-08 11:33:54 UTC
I have the same failure with the same drive and newer kernels. But kernel 3.16.0-4-amd (Debian build) and 3.18.10 had uptimes of 19 and 26 days respectly. I have tried all new kernels since, but they crash my system (including 4.0.1).
Comment 9 Johannes 2015-07-06 06:42:45 UTC
I'm experiencing similar problems with three different Seagate Archive 8TB drives. I used 3.18 patched with multipath-tcp to fill the drives with existing data and switched to 4.0.7 afterwards to make use of overlayfs merged in 3.19. After this change, I was unable to successfully write larger amounts of data (>10GiB) to the disk without having severe problems like others mentioned before.
I support the idea that the drive is doing some kind of internal garbage collection between writes and seems unaccessible during that time. This can even be observed by running the drives with 4.0 until the error occurs and then rebooting to 3.18. The drive will still be unaccessible after rebooting and spit out ATA errors. Even my UEFI doesn't show the drive identifier at that point. After powering down the system for a few seconds, the drive seems to continue to work just fine.

Thanks for filing this bug, I was unable to connect the dots myself and would have never guessed that this is actually caused by the kernel. I started to replace hardware and was going crazy about this problem.

I'm recreating my SnapRAID parity drive at the moment and the parity drive has written over 2TiB without a single error in dmesg on 3.18.6.

I also raised this issue with someone at Seagate, I hope they're interested in this too.

Thank you!
Comment 10 Ciarán Coffey 2015-07-06 13:44:03 UTC
Just to echo a "me too".

I've had the same expierence on 4.0.7 and 4.0.6.

3.14 works without issue.
I have some more notes here.
https://bbs.archlinux.org/viewtopic.php?pid=1542409#p1542409
Comment 11 warpme 2015-07-18 13:06:28 UTC
Just another data point: I have exactly the same issue as OP with 4.0.4 and 4.0.8 kernels. As many ppl are reporting: 3.14/3.16/3.18=OK; 3.19/4.x=NOK I definitely think this is kernel regression in kernel support(tollerance) for SMR DM drives.
Comment 12 warpme 2015-07-22 14:39:30 UTC
Following patch (0006-revert-Implement-ATA_DEV_ZAC.patch) on 4.0.8 allows me to successful write 7T of test data without any SATA error. Without patch drive interface goes down after 50-150G of writes.
Patch effectively reverts:

From 9162c6579bf90b3f5ddb7e3a6c6fa946c1b4cbeb Mon Sep 17 00:00:00 2001
From: Hannes Reinecke <hare@suse.de>
Date: Wed, 5 Nov 2014 13:08:21 +0100
Subject: libata: Implement ATA_DEV_ZAC 

and

From f9ca5ab832e7ac5bc2b6fe0e82ad46d536f436f9 Mon Sep 17 00:00:00 2001
From: Hannes Reinecke <hare@suse.de>
Date: Wed, 5 Nov 2014 13:08:22 +0100
Subject: libata-scsi: Update SATL for ZAC drives

I don't have time to go deeper but looking on code I think issue might be around line 1965 in libata-scsi.c

working:
<------>const u8 versions[] = {
<------><------>0x60,<->/* SAM-3 (no version claimed) */

<------><------>0x03,
<------><------>0x20,<->/* SBC-2 (no version claimed) */

<------><------>0x02,
<------><------>0x60<-->/* SPC-3 (no version claimed) */
<------>};

non-working:
<------>const u8 versions[] = {
<------><------>0x00,
<------><------>0x60,<->/* SAM-3 (no version claimed) */

<------><------>0x03,
<------><------>0x20,<->/* SBC-2 (no version claimed) */

<------><------>0x02,
<------><------>0x60<-->/* SPC-3 (no version claimed) */
<------>};
Comment 13 warpme 2015-07-22 14:40:53 UTC
Created attachment 183241 [details]
Workaround patch for kernel 4.0+
Comment 14 joergn 2015-08-19 12:27:50 UTC
Hi,

Thanks for finding this solution. I have two questions:

1. I guess this patch is not the final solution, because it is removing previously added functionality that, to my understanding, was supposed to help for handling SMR drives. Wouldn't this be needed  sooner or later? Is this issue further investigated?

2. I have checked all releases up to kernel v4.2 rc7. The problematic code is still in everywhere after v3.18. Does anybody have information when this issue would be fixed for good in the mainline Kernel?

Best regards, Jörg
Comment 15 Arzeets 2015-08-22 03:23:03 UTC
Got this problem with 2 different Samsung 8Tb SMR drives. The issue is very annoying. Any chance to see it fixed?
Comment 16 sweeneygj 2015-09-04 10:24:22 UTC
Created attachment 186701 [details]
Bisect to locate error for Seagate ST8000AS0002 archive drive

description: Motherboard
product: M4A88TD-M EVO

Model Family:     Seagate Archive HDD
Device Model:     ST8000AS0002-1NA17Z
Firmware Version: AR13

Bisect log attached.

The last working (good) version I'm sure has the commits that have previously been recommended to be reverted.

With some failures, drive is accessible and responds to smartctl other times it's not. Upon failure the machine must be switched off to get access to the drive. Even then the drive can sometimes report a smart error in bios. It will then restore itself.
Comment 17 sweeneygj 2015-09-06 09:23:11 UTC
Using the patch at http://marc.info/?l=linux-scsi&m=144062955906401&w=2

git checkout v3.19 
git apply <patch>

compiled but didn't fix the problem.

git checkout v3.19 
git revert f9ca5ab832e7ac5bc2b6fe0e82ad46d536f436f9 9162c6579bf90b3f5ddb7e3a6c6fa946c1b4cbeb

compiled but didn't fix the problem.

The smart end-to-end error attribute has started to climb on the drive so will switch back to 3.18 to eliminate hardware error.
Comment 18 Arzeets 2015-09-06 15:31:08 UTC
sweeneygj@, if you have a chance, could you please bisect this issue between v3.18 and v3.19 tags?
Comment 19 sweeneygj 2015-09-06 20:45:15 UTC
(In reply to Arzeets from comment #18)
> sweeneygj@, if you have a chance, could you please bisect this issue between
> v3.18 and v3.19 tags?

Thought I did that: https://bugzilla.kernel.org/attachment.cgi?id=186701
Comment 20 sweeneygj 2015-09-06 20:47:54 UTC
(In reply to sweeneygj from comment #17)

> The smart end-to-end error attribute has started to climb on the drive so
> will switch back to 3.18 to eliminate hardware error.

Successfully wrote 600GB with 3.18.21. Drive appears OK.
Comment 21 Arzeets 2015-09-06 21:00:47 UTC
Sorry, missed your previous comment. The commit you mentioned has been reverted in 4.3 tree https://lkml.org/lkml/2015/7/20/566

Did you check if reverting this commit at recent kernels (e.g. v4.2) fixes the problem?
Comment 22 sweeneygj 2015-09-07 12:21:04 UTC
> Did you check if reverting this commit at recent kernels (e.g. v4.2) fixes
> the problem?


git checkout v4.2
git revert 34b48db66e08ca1c1bc07cf305d672ac940268dc

uname -r
4.2.0-gjs-archive--00001-g6a62234-dirty

After ~3G of writing 'WRITE FPDMA QUEUED' errors and 'device reported invalid CHS sector 0' reported in log.

This is just a ddrescue of a drive. In either case (fail/ok) the drive pauses after the first ~3G of data is written. Drive responds to smartctl.
Comment 23 sweeneygj 2015-09-07 16:35:35 UTC
(In reply to sweeneygj from comment #22)
> > Did you check if reverting this commit at recent kernels (e.g. v4.2) fixes
> > the problem?
> 
> 
> git checkout v4.2
> git revert 34b48db66e08ca1c1bc07cf305d672ac940268dc
> 
> uname -r
> 4.2.0-gjs-archive--00001-g6a62234-dirty
> 
> After ~3G of writing 'WRITE FPDMA QUEUED' errors and 'device reported
> invalid CHS sector 0' reported in log.

Whereas ...

git checkout v3.19
git revert 34b48db66e08ca1c1bc07cf305d672ac940268dc
uname -r
3.19.0-gjs-archive--00001-g25ec59c

Wrote 100GB OK without problems.
Comment 24 Arzeets 2015-09-07 18:44:58 UTC
It sounds like there is a regression in v3.19..v4.2 and another bisect is needed.
Comment 25 sweeneygj 2015-09-14 09:12:52 UTC
Created attachment 187531 [details]
Bisect 4.1 to 4.2 to locate 2nd Seagate SMR bug

#For each bisect test

git revert --no-commit 34b48db66e08

#run test

git reset --hard
Comment 26 sweeneygj 2015-09-14 09:19:51 UTC
(In reply to Arzeets from comment #24)
> It sounds like there is a regression in v3.19..v4.2 and another bisect is
> needed.

git checkout v4.2
git revert --no-commit 34b48db66e08 4f258a46346c

1.2TB of data wrote without error.

https://lkml.org/lkml/2015/8/19/324 thinks 4f258a46346c may be cause of an error too.
Comment 27 sweeneygj 2015-09-21 13:10:41 UTC
Both v4.3-rc1 and v4.3-rc2 require 

git revert --no-commit 4f258a46346c

for ST8000AS0002 to work without error.
Comment 28 Tejun Heo 2015-09-21 18:37:38 UTC
Margin, the offending commit is

 commit 4f258a46346c03fa0bbb6199ffaf4e1f9f599660
 Author: Martin K. Petersen <martin.petersen@oracle.com>
 Date:   Tue Jun 23 12:13:59 2015 -0400

    sd: Fix maximum I/O size for BLOCK_PC requests

Any ideas?
Comment 29 warpme 2015-09-21 18:45:17 UTC
Hmm, it looks like my hypothesis in Comment #12 was wrong. With 2 commits mentioned in comment #12 I still have problem - but very rare. 

However patch from http://marc.info/?l=linux-scsi&m=144062955906401&w=2 gives me stable operation (I/m not able to trigger NCQ issue with any tests like 15T of constantly writing data or heavy random r/w operations). 

Can somebody do test with http://marc.info/?l=linux-scsi&m=144062955906401&w=2 patch?
Comment 30 sweeneygj 2015-09-21 21:01:58 UTC
(In reply to warpme from comment #29)

> Can somebody do test with
> http://marc.info/?l=linux-scsi&m=144062955906401&w=2 patch?

See https://bugzilla.kernel.org/show_bug.cgi?id=93581#c17
Comment 31 warpme 2015-09-24 09:10:07 UTC
Right. Today (after 2 weeks of no-problems) I received series of IO_errors leading to XFS shutdown. It looks like with  http://marc.info/?l=linux-scsi&m=144062955906401&w=2 patch, bug is still present - only different load is needed to trigger it. Today I'll build 4.1.9 with reverted 4f258a46346c03fa0bbb6199ffaf4e1f9f599660 and do some tests.
Comment 32 sweeneygj 2015-09-24 09:34:48 UTC
(In reply to warpme from comment #31)
> Right. Today (after 2 weeks of no-problems) I received series of IO_errors
> leading to XFS shutdown. It looks like with 
> http://marc.info/?l=linux-scsi&m=144062955906401&w=2 patch, bug is still
> present - only different load is needed to trigger it. Today I'll build
> 4.1.9 with reverted 4f258a46346c03fa0bbb6199ffaf4e1f9f599660 and do some
> tests.


See comment #21 and comment #25.

34b48db66e08 is reverted in 4.3

Revert 34b48db66e08 in 4.1.x to test 4f258a46346c.  

For my ddrescue task, at some point between 3GB and 15GB the error would occur and the drive would be inaccessible.

Can you also check smart data before and after - does the end-to-end value change?
Comment 33 warpme 2015-09-25 15:23:59 UTC
FYI: after reverting 34b48db66e08 and 4f258a46346c on 4.1.8 I successfully passed 134 iterations of test. Vanilla kernel kills drive after 3-5 tests...
Comment 34 sweeneygj 2015-09-25 20:22:36 UTC
(In reply to warpme from comment #33)
> FYI: after reverting 34b48db66e08 and 4f258a46346c on 4.1.8 I successfully
> passed 134 iterations of test. Vanilla kernel kills drive after 3-5 tests...

Can you test 4.3-rc2 with and without reverting 4f258a46346c?
Comment 35 warpme 2015-09-26 09:41:11 UTC
oh, I have set of device drivers specific patches + some additional kernel modules. All this needs to be ported to new kernel ver. All this is on running 24/7 home server serving TV/media/surveillance/Internet on whole home. I will try but might be not easy...
Comment 36 sweeneygj 2015-09-26 20:51:55 UTC
(In reply to warpme from comment #35)
> oh, I have set of device drivers specific patches + some additional kernel
> modules. All this needs to be ported to new kernel ver. All this is on
> running 24/7 home server serving TV/media/surveillance/Internet on whole
> home. I will try but might be not easy...

Easier to just try 4.1.8 with only 34b48db66e08 reverted.
Comment 37 Martin K. Petersen 2015-09-28 23:13:40 UTC
Sorry about the delay. Just got back from vacation.

I'm a bit confused as to why 4f258a46346c would be the culprit as it should only affect sizing of BLOCK_PC requests which none of these are presumably are (I see WRITE FPDMA QUEUED in the log above which would be a REQ_TYPE_FS).

The only way my patch would change the behavior for FS requests would be if somehow the ata host did not set max_hw_sectors and is using the new, larger default introduced by Christoph's patch. But I don't see anything obvious in libata that would suggest max_hw_sectors is not getting set.

Anyway. I'll look at it in detail tomorrow.
Comment 38 Ciarán Coffey 2015-09-29 08:29:55 UTC
Worth noting, I tried these drives with NCQ disabled, and they still behaved the same way. I can't test anymore as I'm using the drive. 
I'm running with an old kernel for now (3.14)
Comment 39 Futuref 2015-09-30 12:45:26 UTC
What about kernel 3.19? Is there anything I can do to fix/workaround this issue? I have installed Ubuntu server 15.04 on x86-64 platform.
Comment 40 Futuref 2015-10-06 15:52:00 UTC
Any news?
Comment 41 joergn 2015-10-06 22:12:27 UTC
Hi Futuref,

I am also running Ubuntu and have the same problems. There are also older Kernels for Ubuntu 15.04 (Vivid). See:
http://kernel.ubuntu.com/~kernel-ppa/mainline/?C=M;O=D
I did downgrade Ubuntu 15.04 to 3.18.x kernels and that works well for me. The problem seems to start with the 3.19 Kernels and I do not have any problems with the Seagate Archive drives and the 3.18 Kernel. I also did not find a good work-around for the 3.19 and newer kernels.
For the time being this works well for me and I plan to stay at this kernel version until the problem is solved and the solution made it into the Ubuntu mainline kernel.
Comment 42 Futuref 2015-10-08 21:57:07 UTC
Hello joergn,

I was excatly looking for such concrete instructions :)
Doing the downgrade to 3.18.x made it work again.

The drive just goes and goes...

Thank you so much!
Comment 43 Warren Hughes 2015-10-13 06:23:42 UTC
Here's another data point. I'd been experiencing the same issues (albeit with BTRFS). Now running kernel 4.3rc5-1 and all is well again. Discussion is here: https://mail-archive.com/linux-btrfs@vger.kernel.org/msg47341.html
Comment 44 sweeneygj 2015-10-14 08:17:17 UTC
(In reply to Warren Hughes from comment #43)
> Here's another data point. I'd been experiencing the same issues (albeit
> with BTRFS). Now running kernel 4.3rc5-1 and all is well again. Discussion
> is here: https://mail-archive.com/linux-btrfs@vger.kernel.org/msg47341.html

According to that thread, you started with 4.1.10 and moved to 4.3-rc4. 

34b48db66e08 is reverted in 4.3 which was the original issue.

Now 4f258a46346c is suspect in 4.3.

Not a BTRFS user but its wiki states data chunks are 1GiB, presumably that is  what the balance is 'moving'.

The ddrescue task, comment #32, throws errors anywhere between 3GB and 15GB on ext4 on LUKS.

Can you run ddrescue writing >15GB without error using 4.3rc5-1?
Comment 45 Warren Hughes 2015-10-14 18:44:31 UTC
(In reply to sweeneygj from comment #44)
> (In reply to Warren Hughes from comment #43)
> > Here's another data point. I'd been experiencing the same issues (albeit
> > with BTRFS). Now running kernel 4.3rc5-1 and all is well again. Discussion
> > is here: https://mail-archive.com/linux-btrfs@vger.kernel.org/msg47341.html
> 
> According to that thread, you started with 4.1.10 and moved to 4.3-rc4. 
> 
> 34b48db66e08 is reverted in 4.3 which was the original issue.
> 
> Now 4f258a46346c is suspect in 4.3.
> 
> Not a BTRFS user but its wiki states data chunks are 1GiB, presumably that
> is  what the balance is 'moving'.
> 
> The ddrescue task, comment #32, throws errors anywhere between 3GB and 15GB
> on ext4 on LUKS.
> 
> Can you run ddrescue writing >15GB without error using 4.3rc5-1?

I'd love to but both my SMR drives are part of my BTRFS pool so I can't direct a write to a single drive :-(
Comment 46 Arzeets 2015-10-14 18:51:39 UTC
If you have BTRFS then 'balance' operation is an easy way to trigger this error. With 4.1 kernel 'btrfs balance' fails after 10 minutes for me. Haven't tried recent kernel.
Comment 47 Warren Hughes 2015-10-14 19:50:24 UTC
(In reply to Arzeets from comment #46)
> If you have BTRFS then 'balance' operation is an easy way to trigger this
> error. With 4.1 kernel 'btrfs balance' fails after 10 minutes for me.
> Haven't tried recent kernel.

OK great, will be starting a balance today - will report back
Comment 48 Mario Kicherer 2015-10-25 19:30:23 UTC
Unfortunately, I also see these errors with 3.18.22 (-31822-lowlatency) on Ubuntu 15.04 with a Seagate 8 TB drive. I will try if 3.18.21 works as others said this version was working for them.
Comment 49 joergn 2015-10-25 19:58:46 UTC
Hi Mario and others,

I am on Ubuntu 15.04 and Kernel 3.18.22 since ubuntu made that version available on their server. I have 3 of the 8TB drives and they were stable with all the 3.18 Kernels I tried. I must however say that I did not do a serious stress test but use them by pushing larger media files to the drives. Maybe I am just lucky. Any later Kernel I tried did cause massive problems with my usage profile.

What did you do in order to see the error?

Has anybody tried Ubuntu 15.10 with its 4.2 Kernel yet? I am a little bit hesitant to upgrade because of the potential problems.
Comment 50 Mario Kicherer 2015-10-26 17:08:42 UTC
I'm not 100% sure it is the same fault but my headless machine died again today with 3.18.21-generic while doing a BTRFS send/receive to the 8 TB drive. After a reboot, df says it stopped at 275 GB of ~5TB total amount of data to be transferred. The machine had no other duties. Before the send/receive, a ~14h long SMART self-test completed and it has found no errors.
Comment 51 Mario Kicherer 2015-10-26 17:29:29 UTC
A related response from a Seagate engineer... in April :(

http://marc.info/?l=linux-ide&m=142956580918560&w=2

I will try if it works if I suspend the send/receive from time to time so the drive can flush its buffers.
Comment 52 Futuref 2015-10-27 11:47:00 UTC
Why not getting in contact with Seagate directly? Are there any devs who could talk to Seagate kernel spezialists?

I have written an email to adrian.palmer@seagate.com, but did not get a response yet.

I guess it would help to build up a sustainable solution.
Comment 53 Johannes 2015-10-28 20:01:47 UTC
Created attachment 191421 [details]
dmesg of the session where I'm experiencing problems on 3.14, ncq disabled

I just replaced one of my 8TB Archive HDDs because it was faulty, the drive wouldn't even start properly in the end. I assume this was unrelated to the problem here.

Now that I've gotten the replacement, I've added partitions, encrypted the partition with dm-crypt and created an ext4 filesystem on top of that.
Then I started recovering my data via SnapRaid, after around 200GiB the sata errors started to come back up. I attached the complete dmesg for reference.

I'm using kernel 3.14.51 with NCQ disabled.
I've talked to Adrian Palmer in the past, I also tried to make him aware of the problems we're having here. I really hope I can get this drive to work sometime soon without having to constantly worry about problems...
Comment 54 sweeneygj 2015-10-28 21:59:54 UTC
(In reply to Johannes from comment #53)
> Created attachment 191421 [details]
> dmesg of the session where I'm experiencing problems on 3.14, ncq disabled

No "device reported invalid CHS sector 0" which I and the OP get.

> Then I started recovering my data via SnapRaid, after around 200GiB the sata
> errors started to come back up. I attached the complete dmesg for reference.

Is the drive responding to smartctl?
Comment 55 Johannes 2015-10-29 05:26:12 UTC
(In reply to sweeneygj from comment #54) 
> Is the drive responding to smartctl?

Yes, although slower than usual, the drive was responding to smartctl -a.

The errors you're seeing where also coming up continously even after I unmounted and luksClosed the device. I eventually shut the machine down.
Comment 56 sweeneygj 2015-10-29 11:07:15 UTC
(In reply to Johannes from comment #55)
> (In reply to sweeneygj from comment #54) 
> > Is the drive responding to smartctl?
> 
> Yes, although slower than usual, the drive was responding to smartctl -a.

Does it report any reallocated sectors, pending or uncorrectable sectors?

For 3.14.51, I would 

shred -v -n 1 /dev/NEW_DRIVE

and see if that completes.
Comment 57 sweeneygj 2015-10-29 11:09:01 UTC
(In reply to Johannes from comment #53)

> Then I started recovering my data via SnapRaid, after around 200GiB the sata
> errors started to come back up. I attached the complete dmesg for reference.
> 
> I'm using kernel 3.14.51 with NCQ disabled.


Same usage pattern here, different kernel: 4.3-rc6 built using comment #27.
Comment 58 Johannes 2015-10-29 18:48:15 UTC
Created attachment 191501 [details]
Logs of the session a day after Comment #53

I just started the machine again and tried to continue my SnapRAID restore procedure. I attached the dmesg of that session, after very few MiB written and read, the drive started to throw errors again. You can seen this in the upper area of the attached log.

After rebooting the machine (what else can I do at this point), the drive started spitting these even before I unlocked my root volume. You will find this in the lower area of the file.

[    6.270466] ata2: link is slow to respond, please be patient (ready=0)
[   10.887959] ata2: COMRESET failed (errno=-16)
[   16.275040] ata2: link is slow to respond, please be patient (ready=0)
[   20.892532] ata2: COMRESET failed (errno=-16)
[   26.279609] ata2: link is slow to respond, please be patient (ready=0)
[   55.903535] ata2: COMRESET failed (errno=-16)
[   55.904200] ata2: limiting SATA link speed to 3.0 Gbps
[   60.930808] ata2: COMRESET failed (errno=-16)
[   60.931481] ata2: reset failed, giving up

At this point, I'm pretty much lost on what to do with these drives. I will check whether writing data to the disk (e.g. via shred) is at all possible.
Comment 59 sweeneygj 2015-10-29 19:41:12 UTC
(In reply to Johannes from comment #58)
> Created attachment 191501 [details]
> Logs of the session a day after Comment #53
> 
> I just started the machine again and tried to continue my SnapRAID restore
> procedure. I attached the dmesg of that session, after very few MiB written
> and read, the drive started to throw errors again. You can seen this in the
> upper area of the attached log.

[From dmesg]

[  797.639563] cdb[0]=0x88: 88 00 00 00 00 00 06 ab 56 00 00 00 02 00 00 00
[  797.639574] end_request: I/O error, dev sdb, sector 111891968
[  797.639620] sd 1:0:0:0: [sdb] Unhandled error code
[  797.639622] sd 1:0:0:0: [sdb]  
[  797.639624] Result: hostbyte=0x04 driverbyte=0x00
[  797.639626] sd 1:0:0:0: [sdb] CDB: 
[  797.639627] cdb[0]=0x8a: 8a 00 00 00 00 00 99 c0 41 50 00 00 00 10 00 00
[  797.639636] end_request: I/O error, dev sdb, sector 2579513680

I would scan the drive for errors first.
Comment 60 Anthony DeRobertis 2015-10-30 01:01:58 UTC
(In reply to Johannes from comment #58)

> After rebooting the machine (what else can I do at this point), …

I've seen that behavior too on the drives—or where they don't even show up at all. You need to do a full power cycle to bring them back. A reset/reboot is not sufficient.

[Honestly, this strikes me as a drive firmware bug.]
Comment 61 Johannes 2015-10-30 07:34:21 UTC
Created attachment 191521 [details]
dmesg and smartctl after letting the drive 'settle'

(In reply to Anthony DeRobertis from comment #60)
> I've seen that behavior too on the drives—or where they don't even show up
> at all. You need to do a full power cycle to bring them back. A reset/reboot
> is not sufficient.
> 
> [Honestly, this strikes me as a drive firmware bug.]

Yes, this is what I noticed too.
Yesterday, the drive went even further. After doing full power cycle, the drive didn't come back up (COMRESET failed before opening my encrypted /, not showing up in lsblk). Even I physically moved it into another machine, the drive didn't show up in Linux and even Windows 7. However, I noticed a periodic, even rhythmic noise from the device so I guessed it was 'doing' something, so I let it sit for at least 20min, then the noise had stopped.

I rebooted the machine and the drive was visible in both Linux and Windows. I then moved the drive back to the original machine where I started the SnapRAID restoration procedure. Because I restored around 370G so far, SnapRAID verified this first so it read around 370G, which worked flawlessly until around 360G, then I got input/output errors and SnapRAID stopped. The attachment contains a log of this session plus a smartctl -a.

I'm now running a full test with 'SeaTools' (on Windows, the FreeDOS based ones are plain buggy), I _think_ this drive is faulty. Sorry for the noise here, I hope this can be figured out....
Comment 62 sweeneygj 2015-10-30 09:57:55 UTC
(In reply to Johannes from comment #61)
> Created attachment 191521 [details]
> dmesg and smartctl after letting the drive 'settle'

> 
> I'm now running a full test with 'SeaTools' (on Windows, the FreeDOS based
> ones are plain buggy), I _think_ this drive is faulty. Sorry for the noise
> here, I hope this can be figured out....

Two AR13 and one AR15 firmware versions, the latter is now suspect. 

Are the AR13 file sizes significantly less than a snapraid parity file?
Written with 3.14.51 too?
Comment 63 Georg Lukas 2015-10-30 17:16:47 UTC
Just to add another three datapoints, running btrfs on two AR15 drives:

4.3-rc7 with reverted 4f258a46346c --> reliably gets ata errors during btrfs-scrub / when writing large amounts of data.

4.3-rc7 with reverted 9162c6579bf9 f9ca5ab832e7 --> initial btrfs-scrub passed, very sporadic errors later on, but can kill your raid anyway.

4.3-rc7 with reverted 9162c6579bf9 f9ca5ab832e7 4f258a46346c --> no ata errors after some days of activity... yet.
Comment 64 sweeneygj 2015-10-30 21:50:50 UTC
(In reply to Georg Lukas from comment #63)
> Just to add another three datapoints, running btrfs on two AR15 drives:
> 
> 4.3-rc7 with reverted 4f258a46346c --> reliably gets ata errors during
> btrfs-scrub / when writing large amounts of data.
> 
> 4.3-rc7 with reverted 9162c6579bf9 f9ca5ab832e7 --> initial btrfs-scrub
> passed, very sporadic errors later on, but can kill your raid anyway.

raid ..... with archive drives?
 
> 4.3-rc7 with reverted 9162c6579bf9 f9ca5ab832e7 4f258a46346c --> no ata
> errors after some days of activity... yet.

Regarding f9ca5ab832e7, see comment #17.

Can you test the following?

git checkout v4.3-rc7
git revert 4f258a46346c
git apply <patch linked to in comment #17>

It would be better to test on a single drive: can you safely btrfs delete device?
Comment 65 Georg Lukas 2015-11-11 13:03:44 UTC
(In reply to sweeneygj from comment #64)
> raid ..... with archive drives?

So far all the information I could find regarding that combination was that it is not advised by Seagate, due to the SMR performance characteristics. By using BTRFS, which only writes actual data during rebuilds, I hoped to mitigate that "problem".

> git checkout v4.3-rc7
> git revert 4f258a46346c
> git apply <patch linked to in comment #17>

Has been running for over a week now with different I/O patterns and without problems.

> It would be better to test on a single drive: can you safely btrfs delete
> device?

Unfortunately no. I'm currently running a degraded RAID1 with one disk only, which hopefully amounts to something like a single individual disk ;)
Comment 66 Martin K. Petersen 2015-11-11 14:41:57 UTC
Figured out what's going on. Patch coming...
Comment 68 Arzeets 2015-11-12 02:25:26 UTC
Georg Lukas, I compiled 4.3 with reverted 3 patches you mentioned (9162c6579bf9 f9ca5ab832e7 4f258a46346c) and ran 'btrfs balance' on my machine. With stock 4.3 kernel the balance crashes few minutes after start. With reverted 3 patches it was running ~30 hours without any error, then I got this one:


kernel: ------------[ cut here ]------------
kernel: WARNING: CPU: 0 PID: 12039 at fs/btrfs/extent-tree.c:2093 __btrfs_inc_extent_ref.isra.22+0x207/0x270 [btrfs]()
kernel: BTRFS: Transaction aborted (error -17)
kernel: Modules linked in: nct6775 hwmon_vid btrfs intel_rapl iosf_mbi xor x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc3
kernel: 
kernel: CPU: 0 PID: 12039 Comm: kworker/u4:4 Not tainted 4.3.0-1-ARCH #1
kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./H61M/U3S3, BIOS P2.20 07/30/2012
kernel: Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs]
kernel:  0000000000000000 000000000ae6b08b ffff88010bfafae8 ffffffff812bbc59
kernel:  ffff88010bfafb30 ffff88010bfafb20 ffffffff81074f42 0000000000000000
kernel:  ffff88007abd9800 ffff880117a3e2d0 0000000000000005 00000000ffffffef
kernel: Call Trace:
kernel:  [<ffffffff812bbc59>] dump_stack+0x4b/0x72
kernel:  [<ffffffff81074f42>] warn_slowpath_common+0x82/0xc0
kernel:  [<ffffffff81074fdc>] warn_slowpath_fmt+0x5c/0x80
kernel:  [<ffffffffa06ea437>] __btrfs_inc_extent_ref.isra.22+0x207/0x270 [btrfs]
kernel:  [<ffffffffa06efa12>] __btrfs_run_delayed_refs+0xca2/0x10d0 [btrfs]
kernel:  [<ffffffff810a4a9c>] ? __enqueue_entity+0x6c/0x70
kernel:  [<ffffffffa06f2bf2>] btrfs_run_delayed_refs+0x82/0x290 [btrfs]
kernel:  [<ffffffffa06f2e37>] delayed_ref_async_start+0x37/0x90 [btrfs]
kernel:  [<ffffffffa0739127>] btrfs_scrubparity_helper+0x77/0x2e0 [btrfs]
kernel:  [<ffffffffa07393ce>] btrfs_extent_refs_helper+0xe/0x10 [btrfs]
kernel:  [<ffffffff8108cfeb>] process_one_work+0x14b/0x440
kernel:  [<ffffffff8108d328>] worker_thread+0x48/0x4a0
kernel:  [<ffffffff81579940>] ? __schedule+0x340/0xa60
kernel:  [<ffffffff8108d2e0>] ? process_one_work+0x440/0x440
kernel:  [<ffffffff8108d2e0>] ? process_one_work+0x440/0x440
kernel:  [<ffffffff81092e68>] kthread+0xd8/0xf0
kernel:  [<ffffffff81092d90>] ? kthread_worker_fn+0x170/0x170
kernel:  [<ffffffff8157de9f>] ret_from_fork+0x3f/0x70
kernel:  [<ffffffff81092d90>] ? kthread_worker_fn+0x170/0x170
kernel: ---[ end trace 47f32383c2e64422 ]---
kernel: BTRFS: error (device sdd) in __btrfs_inc_extent_ref:2093: errno=-17 Object already exists
kernel: BTRFS info (device sdd): forced readonly
kernel: BTRFS: error (device sdd) in btrfs_run_delayed_refs:2851: errno=-17 Object already exists
kernel: BTRFS: error (device sdd) in __btrfs_inc_extent_ref:2093: errno=-17 Object already exists
kernel: BTRFS: error (device sdd) in btrfs_run_delayed_refs:2851: errno=-17 Object already exists


I do not know if it is specific to HDD with SMR or it just a btrfs bug. I remounted the partition and started balance again - this time it finished after about 10 more hours.

Thus reverting those 3 commits definitely improves situation.





Martin, thanks, I am going to try your patch in the next few days and then provide you my feedback.
Comment 69 sweeneygj 2015-11-12 14:15:56 UTC
(In reply to Martin K. Petersen from comment #67)
> Please try this:
> 
> http://git.kernel.org/cgit/linux/kernel/git/mkp/linux.git/commit/?h=bugzilla-
> 93581&id=7c4fbd50bfece00abf529bc96ac989dd2bb83ca4

Works for me. 

Applied to 4.3 and 240GB transferred without error.
Comment 70 Christoph Anton Mitterer 2015-11-13 02:51:36 UTC
hmm I also have some of these drives (ST8000AS0002-1NA17Z Firmware Version: AR13) and interestingly I've never had these problems...

I always ran at Debian sid (which is right now kernel 4.2.6) but I guess I got one of the first drives that were sold in Germany (when did they come out?) so I must have started with an older version.

Also, I use btrfs on those drives and once send/received one nearly full fs (6TB or so) in one bunch to another 8TB disk... and I noticed no single error[0]

So either the difference is btrfs, or something in Debian... or perhaps that I'm always running these devices via a USB/SATA bridge (with USB3.0 though)... so perhaps I never ran into these issues because of the slowness of the bridge.

HTH

[0] Well I have other problems now, but I guess these are purely btrfs related: http://thread.gmane.org/gmane.comp.file-systems.btrfs/50107
Comment 71 Robert Krig 2015-11-13 10:04:35 UTC
Hi, just wanted to chime in, in case this helps things further along.

I'm experiencing the same problems as everyone else here.

I'm running Debian Jessie (current stable), with a own compiled kernel 4.3.0 and using BTRFS as a filesystem.


I have a server with 4 drive hotswappable drive bay.

I bought a 8TB Seagate HD, stuck it in the bay, and proceeded to copy data to it. After about 40GB, the drive stopped responding and I noticed a lot of kernel errors. I figured the drive was faulty, so I ordered a replacement from amazon.

So the replacement arrived. I figured, this time I'd test it out first before I integrate it into my storage array. I hooked up the drive to usb-sata adapter, formatted with btrfs and proceeded to rsync my data onto it.

This ran the whole night and I managed to copy 670GB of data onto it. Via USB2.0 the data rate topped at about 20MB/s. I figured the drive is good, so I put into the drive bay of my server. Now directly attached via SATA.

And lo and behold, dmesg errors again. Just like everyone else has been reporting.

I'm not sure if it's simply the lower transfer speed, or the things are differently managed via a USB interface.

By the way, when the drive was attached via USB, I had it connected to a different computer running Debian Jessie with the 4.2.0 kernel.

I've now downgraded my server to kernel 3.16 to see if the errors still persist.
Comment 72 alex.wellmann 2015-11-13 10:10:26 UTC
Just a Comment from my side. This is definitely not just BTRFS related as I am having the same issues using EXT4 on 4.1 kernel (Ubuntu)...
Comment 73 Mario Kicherer 2015-11-13 10:15:48 UTC
I'd say the circumstances that provoke this issue are clear now. We should all test the patch of Martin in Comment 67 now with kernel 4.3 in order to get this patch into the next kernel version asap. 

I'll start a send/receive of ~6TB during the weekend.
Comment 74 warpme 2015-11-13 11:23:13 UTC
Martin,
It is possible to prepare patch https://bugzilla.kernel.org/show_bug.cgi?id=93581#c67 for 4.1 mainline?
I have 24/7 server running 4.1.8 with Seagate 8T SMR and all problems listed in this bugreport. I can easily test how Your fix is working in 4.1 kernel.
Comment 75 Robert Krig 2015-11-13 13:30:21 UTC
(In reply to Robert Krig from comment #71)

> 
> I've now downgraded my server to kernel 3.16 to see if the errors still
> persist.


Just wanted to mention, rsync has been running for a few hours now. Copied a further 900GB of Data. No kernel error messages what so ever on the standard Debian Jessie kernel "3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1+deb8u5"
Comment 76 Arzeets 2015-11-13 13:35:12 UTC
Martin K. Petersen, I was running Arch with 4.3 kernel + your patch and it makes the error go away.

I was running 'btrfs balance' for more than 24 hours now with spikes of additional read/write traffic. Arch 4.3 kernel with your patch handles it. No any kernel messages/errors. Everything looks good now.

Thanks Martin for the patch. When it is accepted upstream please 'CC' stable kernel also. It would be great to see it in 4.3 as well.
Comment 77 Futuref 2015-11-13 14:04:22 UTC
Hello together, as this ticket was created on the branch 4.19.x, I was wondering if this branch might get patched too, Ubuntu is running on it...
Comment 78 David Eisner 2015-11-15 15:07:56 UTC
Ubuntu 15.10 (Wily)
v4.3-0-g6a13feb with 7c4fbd5 from Comment 67
Ubuntu scripts from http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.3-unstable/ 

Model Number:       ST8000AS0002-1NA17Z
Firmware Revision:  AR13

I'm filling a new btrfs with ~6T. This seems to be working. I'm only on 160G, but it failed long before this with the the generic amd64 Ubuntu 14.04.3 LTS and 15.10 kernels. I will report if "WRITE FPDMA QUEUED" returns.

Thank you very much, Martin K. Petersen!

Is there anywhere else I should post in support of the patch?

It looked like a very small conflict to apply it to Ubuntu-4.2.0-19.23-0-g22ed8ce, but for me it was easier to just build v4.3.
Comment 79 Henk Slager 2015-11-15 16:50:47 UTC
(In reply to Martin K. Petersen from comment #67)
> Please try this:
> 
> http://git.kernel.org/cgit/linux/kernel/git/mkp/linux.git/commit/?h=bugzilla-
> 93581&id=7c4fbd50bfece00abf529bc96ac989dd2bb83ca4

Thanks Martin for supplying the patch, I applied it to kernel 4.3.0 and booted a J1900N-D3V board with the ST8000AS0002-1NA17Z FW AR13 with it, SATA2 3Gbps.
Created a 2.3T partition (after a first 5T one), btrfs single d/m/s profile and started filling it by rsyncing data over Gbit LAN. 1.5TB transferred with roughly 50MB/s transferspeed total average, no ATA errors in dmesg. 

Then I put the drive in an H87M-PRO board, SATA3 6Gbps. Same transfertest, also no ATA errors in dmesg. 
The 5T partition is bcache + dm-crypt + btrfs(compress-force=zlib,nossd) that I have filled with kernel 3.18.22 on an AMD E-350 board, also without ATA errors in dmesg (total average roughly 10MB/s).

Then doing 3x parallel    dd_rescue /dev/sdX file   (3 independent SATA devices, files on 5T) resulted in ATA errors soon after transfers started. After about 150GB in total, the 5T fs went ro and kernel BUG in btrfs/extent-tree.c due to the I/O errors I assume.
Comment 80 Mario Kicherer 2015-11-15 21:24:40 UTC
My machine successfully transferred a 5 TB btrfs subvolume with kernel 4.3 and Martin's patch. I will compare the data now but everything looks good so far. Thank you Martin!
Comment 81 Max 2015-11-26 14:03:20 UTC
Hi, will this bug ever officially been fixed? How can i fix this bug without changing the kernel, now i cant use my virtualbox, because of an missing 'modprobe vboxdrv' .. Without the fix my hdd is clicking all the time and is slow and same times unaccessible. With w7 it worked fine.
How can i add the fix to the existing kernel?
Comment 82 Robert Krig 2015-11-30 10:22:35 UTC
Hi, kernel 4.4 was just released. I checked the changelogs but I didn't see any mention of this bugfix. 

Does anyone know if the patch that fixes this bug has already been implemented in upstream kernel 4.4 release?
Comment 83 Mario Kicherer 2015-12-02 21:08:08 UTC
Good news, Martin's patch was accepted:

Your commit has been added to the upstream scsi tree On branch "fixes"
This patch is scheduled to be pushed for 4.4
Comment 84 Arzeets 2015-12-02 21:11:27 UTC
Great news!!! And this patch should be absolutely backported to stable kernel as well.
Comment 85 Arzeets 2015-12-07 18:40:19 UTC
This patch is part of upstream kernel now. Its upstream sha is ca369d51b3e1649be4a72addd6d6a168cfb3f537

Thanks for your work Martin. Could you please send it to stable maillist please? The change should be really backported to previous kernel versions.
Comment 86 Anthony DeRobertis 2015-12-09 18:57:15 UTC
I am running a btrfs balance on 4.4rc4 (which has the patch) and can confirm it seems to be working. No errors so far, and the balance is almost done.
Comment 87 Henk Slager 2015-12-12 15:30:49 UTC
I have upgraded the kernel from 4.3.0+patch to 4.4.0-rc4.  /sys/block/sdb/device/queue_depth is 31, 3Gbps SATA (chipset limit), the disk is an iSCSI target and roughly 400G extra has been written/pushed to it so far. No errors, all runs fine.
Comment 88 shan 2015-12-15 17:07:02 UTC
I compiled 4.4.0-rc5 and gave it a try. I am not facing any ATA errors like before but now drive is sooo slow (and also quite warm). For writing data (I do copy with Midnight Commander) it shows almost 1MB/s. 

I don't know are my problems now related to luks or something else. I am running ext4 inside luks container.

I get many following sets.

[ 2280.116086] INFO: task kworker/u8:0:2535 blocked for more than 120 seconds.
[ 2280.116092]       Tainted: P           O    4.4.0-rc5 #1
[ 2280.116094] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2280.116096] kworker/u8:0    D 0000000000000001     0  2535      2 0x00000000
[ 2280.116106] Workqueue: writeback wb_workfn (flush-251:1)
[ 2280.116110]  ffff8800994786c0 ffff8801c8284e40 ffff8801c5efc000 ffff8801c5efb670
[ 2280.116114]  7fffffffffffffff ffff8801cffda1d0 ffff8801c5efb708 ffffffff81559830
[ 2280.116118]  ffffffff81559121 0000000000000000 ffffffff8155bb49 7fffffffffffffff
[ 2280.116122] Call Trace:
[ 2280.116129]  [<ffffffff81559830>] ? bit_wait+0x60/0x60
[ 2280.116133]  [<ffffffff81559121>] ? schedule+0x31/0x80
[ 2280.116136]  [<ffffffff8155bb49>] ? schedule_timeout+0x189/0x250
[ 2280.116140]  [<ffffffff81559830>] ? bit_wait+0x60/0x60
[ 2280.116144]  [<ffffffff8155879d>] ? io_schedule_timeout+0x9d/0x100
[ 2280.116148]  [<ffffffff810aa495>] ? prepare_to_wait_exclusive+0x55/0x80
[ 2280.116151]  [<ffffffff81559847>] ? bit_wait_io+0x17/0x60
[ 2280.116155]  [<ffffffff81559627>] ? __wait_on_bit_lock+0x47/0xa0
[ 2280.116158]  [<ffffffff81559830>] ? bit_wait+0x60/0x60
[ 2280.116161]  [<ffffffff815596ee>] ? out_of_line_wait_on_bit_lock+0x6e/0x80
[ 2280.116165]  [<ffffffff810aa870>] ? autoremove_wake_function+0x40/0x40
[ 2280.116174]  [<ffffffffa02229cb>] ? do_get_write_access+0x1fb/0x450 [jbd2]
[ 2280.116178]  [<ffffffff811f4168>] ? __find_get_block+0xe8/0xf0
[ 2280.116182]  [<ffffffff811f4446>] ? __getblk_gfp+0x26/0x50
[ 2280.116187]  [<ffffffffa0222c6a>] ? jbd2_journal_get_write_access+0x4a/0x60 [jbd2]
[ 2280.116210]  [<ffffffffa0274d86>] ? __ext4_journal_get_write_access+0x36/0x70 [ext4]
[ 2280.116222]  [<ffffffffa0247a1e>] ? ext4_reserve_inode_write+0x6e/0x90 [ext4]
[ 2280.116238]  [<ffffffffa026eb15>] ? ext4_ext_insert_extent+0x6e5/0x1300 [ext4]
[ 2280.116250]  [<ffffffffa0247a7f>] ? ext4_mark_inode_dirty+0x3f/0x1c0 [ext4]
[ 2280.116273]  [<ffffffffa026eb15>] ? ext4_ext_insert_extent+0x6e5/0x1300 [ext4]
[ 2280.116285]  [<ffffffffa027cad9>] ? ext4_mb_new_blocks+0x139/0x4a0 [ext4]
[ 2280.116296]  [<ffffffffa027216e>] ? ext4_ext_map_blocks+0x3de/0xd00 [ext4]
[ 2280.116299]  [<ffffffff81155e0c>] ? find_get_pages_tag+0x10c/0x160
[ 2280.116308]  [<ffffffffa02450d5>] ? ext4_map_blocks+0x165/0x490 [ext4]
[ 2280.116316]  [<ffffffffa0248167>] ? ext4_writepages+0x567/0xcc0 [ext4]
[ 2280.116319]  [<ffffffff811ec52d>] ? __writeback_single_inode+0x3d/0x290
[ 2280.116321]  [<ffffffff811ecbfe>] ? writeback_sb_inodes+0x1de/0x3e0
[ 2280.116323]  [<ffffffff811ece81>] ? __writeback_inodes_wb+0x81/0xb0
[ 2280.116325]  [<ffffffff811ed13f>] ? wb_writeback+0x1ef/0x280
[ 2280.116327]  [<ffffffff811ed846>] ? wb_workfn+0x1e6/0x340
[ 2280.116330]  [<ffffffff8108554f>] ? process_one_work+0x19f/0x3d0
[ 2280.116332]  [<ffffffff810857cd>] ? worker_thread+0x4d/0x450
[ 2280.116335]  [<ffffffff81085780>] ? process_one_work+0x3d0/0x3d0
[ 2280.116337]  [<ffffffff8108b29d>] ? kthread+0xbd/0xe0
[ 2280.116339]  [<ffffffff8108b1e0>] ? kthread_create_on_node+0x170/0x170
[ 2280.116342]  [<ffffffff8155cd8f>] ? ret_from_fork+0x3f/0x70
[ 2280.116344]  [<ffffffff8108b1e0>] ? kthread_create_on_node+0x170/0x170

After many of these I get 

[ 3445.343562] perf interrupt took too long (2508 > 2500), lowering kernel.perf_event_max_sample_rate to 50000

Then errors are gone, but bad performance stays.
Comment 89 Ciarán Coffey 2015-12-15 17:10:59 UTC
Is that drive almost full?
SMR drives will slow down a lot when they start to fill.
Comment 90 shan 2015-12-15 17:24:15 UTC
No, I have managed to copy less than 1gb. It's sooo slow.
I dont think SMR disk speed would ever reduce to these speeds.
Comment 91 Anthony DeRobertis 2015-12-15 17:44:01 UTC
(In reply to shan from comment #90)
> No, I have managed to copy less than 1gb. It's sooo slow.
> I dont think SMR disk speed would ever reduce to these speeds.

btrfs rebalance was doing ~55MB/sec for a few days straight to ours. That was with -rc4, and about 70% full. 

The "blocked for more than 120 seconds" messages are I suspect just telling you what you already know—it's going slow.

Suggest checking smartctl, iostat, etc.
Comment 92 shan 2015-12-25 06:50:19 UTC
I did many tests, removed luks, tried with plain ext4, attached straight to sata (I used it from the usb docking station). 
Funny thing was that from usb docking station it copied 800Mb fast (at least Midnight commander showed show), before it froze. Attached to sata it copied 1600Mb fast before frozage.

Then I started to play with partition table and noticed that parted and fdisk made partitions which always gave warning about alignment? "Warning: The resulting partition is not properly aligned for best performance.".

Then I created partition with gparted and now parted showed:
(parted) align-check opt 1                                                
1 aligned

And voila now I have managed to achieve those ~55MB/sec speeds.
Comment 93 Christoph Anton Mitterer 2015-12-27 00:09:56 UTC
Hmm sounds quite worrying... while it's understandable that you may suffer from some performance penalty when you fs (or any other block layer) isn't aligned, one shouldn't suffer from freeze... :(

May indicate that the fix isn't complete yet.
Comment 94 shan 2015-12-27 05:28:27 UTC
I have now copied 6TB of data and everything seems to work.

But, I have got two of those "Blocked for more than 120" messages and I thought that disk will freeze again, but with both cases it continued with good speed. This happened when I had copied around 70GB.

There was something about ext4 lazy init, If I remember correctly (I lost logs),  also there were some messages related the source I copied data from (seeks errors with USB drive). It could be that errors with source made these "blocked" messages.
Comment 95 shan 2015-12-27 05:41:53 UTC
Found the logs for those two "blocked for more than 120". I think these are not related to the source, the source errors happened later.

Dec 25 09:17:22 i7 kernel: [ 1560.020132] INFO: task ext4lazyinit:2009 blocked for more than 120 seconds.
Dec 25 09:17:22 i7 kernel: [ 1560.020141]       Not tainted 4.4.0-rc6-amd64 #1
Dec 25 09:17:22 i7 kernel: [ 1560.020144] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 25 09:17:22 i7 kernel: [ 1560.020148] ext4lazyinit    D ffff88042fa95940     0  2009      2 0x00000000
Dec 25 09:17:22 i7 kernel: [ 1560.020157]  ffff8803f86ecf40 ffff88041d6b2180 ffff8803d8c10000 ffff8803d8c0fcf8
Dec 25 09:17:22 i7 kernel: [ 1560.020163]  7fffffffffffffff ffff8803f86ecf40 0000000000001000 000077ff80000000
Dec 25 09:17:22 i7 kernel: [ 1560.020168]  ffffffff81586291 0000000000000000 ffffffff81589069 7fffffffffffffff
Dec 25 09:17:22 i7 kernel: [ 1560.020174] Call Trace:
Dec 25 09:17:22 i7 kernel: [ 1560.020186]  [<ffffffff81586291>] ? schedule+0x31/0x80
Dec 25 09:17:22 i7 kernel: [ 1560.020193]  [<ffffffff81589069>] ? schedule_timeout+0x189/0x250
Dec 25 09:17:22 i7 kernel: [ 1560.020209]  [<ffffffffa06dfd4c>] ? dm_make_request+0x6c/0xc0 [dm_mod]
Dec 25 09:17:22 i7 kernel: [ 1560.020214]  [<ffffffff8158590d>] ? io_schedule_timeout+0x9d/0x100
Dec 25 09:17:22 i7 kernel: [ 1560.020219]  [<ffffffff81587651>] ? wait_for_completion_io+0xf1/0x130
Dec 25 09:17:22 i7 kernel: [ 1560.020229]  [<ffffffff8109c470>] ? wake_up_q+0x60/0x60
Dec 25 09:17:22 i7 kernel: [ 1560.020237]  [<ffffffff812bfa30>] ? blkdev_issue_zeroout+0x1f0/0x260
Dec 25 09:17:22 i7 kernel: [ 1560.020260]  [<ffffffffa022f6f2>] ? ext4_init_inode_table+0x182/0x370 [ext4]
Dec 25 09:17:22 i7 kernel: [ 1560.020284]  [<ffffffffa024b6a3>] ? ext4_lazyinit_thread+0x123/0x2b0 [ext4]
Dec 25 09:17:22 i7 kernel: [ 1560.020305]  [<ffffffffa024b580>] ? init_once+0x60/0x60 [ext4]
Dec 25 09:17:22 i7 kernel: [ 1560.020313]  [<ffffffff8109152d>] ? kthread+0xcd/0xf0
Dec 25 09:17:22 i7 kernel: [ 1560.020319]  [<ffffffff81091460>] ? kthread_create_on_node+0x190/0x190
Dec 25 09:17:22 i7 kernel: [ 1560.020326]  [<ffffffff8158a2cf>] ? ret_from_fork+0x3f/0x70
Dec 25 09:17:22 i7 kernel: [ 1560.020332]  [<ffffffff81091460>] ? kthread_create_on_node+0x190/0x190
Dec 25 09:35:22 i7 kernel: [ 2639.936652] INFO: task ext4lazyinit:2009 blocked for more than 120 seconds.
Dec 25 09:35:22 i7 kernel: [ 2639.936661]       Not tainted 4.4.0-rc6-amd64 #1
Dec 25 09:35:22 i7 kernel: [ 2639.936664] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 25 09:35:22 i7 kernel: [ 2639.936667] ext4lazyinit    D ffff88042fa95940     0  2009      2 0x00000000
Dec 25 09:35:22 i7 kernel: [ 2639.936676]  ffff8803f86ecf40 ffff88041d6b2180 ffff8803d8c10000 ffff8803d8c0fcf8
Dec 25 09:35:22 i7 kernel: [ 2639.936682]  7fffffffffffffff ffff8803f86ecf40 0000000000001000 000077ff80000000
Dec 25 09:35:22 i7 kernel: [ 2639.936687]  ffffffff81586291 0000000000000000 ffffffff81589069 7fffffffffffffff
Dec 25 09:35:22 i7 kernel: [ 2639.936693] Call Trace:
Dec 25 09:35:22 i7 kernel: [ 2639.936705]  [<ffffffff81586291>] ? schedule+0x31/0x80
Dec 25 09:35:22 i7 kernel: [ 2639.936712]  [<ffffffff81589069>] ? schedule_timeout+0x189/0x250
Dec 25 09:35:22 i7 kernel: [ 2639.936728]  [<ffffffffa06dfd4c>] ? dm_make_request+0x6c/0xc0 [dm_mod]
Dec 25 09:35:22 i7 kernel: [ 2639.936733]  [<ffffffff8158590d>] ? io_schedule_timeout+0x9d/0x100
Dec 25 09:35:22 i7 kernel: [ 2639.936739]  [<ffffffff81587651>] ? wait_for_completion_io+0xf1/0x130
Dec 25 09:35:22 i7 kernel: [ 2639.936748]  [<ffffffff8109c470>] ? wake_up_q+0x60/0x60
Dec 25 09:35:22 i7 kernel: [ 2639.936756]  [<ffffffff812bfa30>] ? blkdev_issue_zeroout+0x1f0/0x260
Dec 25 09:35:22 i7 kernel: [ 2639.936780]  [<ffffffffa022f6f2>] ? ext4_init_inode_table+0x182/0x370 [ext4]
Dec 25 09:35:22 i7 kernel: [ 2639.936803]  [<ffffffffa024b6a3>] ? ext4_lazyinit_thread+0x123/0x2b0 [ext4]
Dec 25 09:35:22 i7 kernel: [ 2639.936824]  [<ffffffffa024b580>] ? init_once+0x60/0x60 [ext4]
Dec 25 09:35:22 i7 kernel: [ 2639.936831]  [<ffffffff8109152d>] ? kthread+0xcd/0xf0
Dec 25 09:35:22 i7 kernel: [ 2639.936838]  [<ffffffff81091460>] ? kthread_create_on_node+0x190/0x190
Dec 25 09:35:22 i7 kernel: [ 2639.936844]  [<ffffffff8158a2cf>] ? ret_from_fork+0x3f/0x70
Dec 25 09:35:22 i7 kernel: [ 2639.936851]  [<ffffffff81091460>] ? kthread_create_on_node+0x190/0x190
Comment 96 Christoph Anton Mitterer 2015-12-28 00:46:58 UTC
Hmm what exactly did you mean with "freeze" in the first place?
The system being unresponsive for a while, but then continuing without any errors except e.g. those 120s blocked messages?
Cause that could be just "normal" (or a pointer that the IO scheduler could be improved ;)... but if "freeze" means the system crashed or became unresponsive forever, that would be a "real" bug.

Those "blocked for more than 120" are probably no reason to worry much.

Did you experience any other fs corruptions then?
Comment 97 shan 2015-12-28 04:27:59 UTC
Before first bugfix, yes I lost everything (I am using luks). After fix I haven't faced any corruption (At least luks container is still intact).

Hmm problem is that I dont know how Midnight Commander works, but that is my "frontend" where I see this frozage. Frontend shows me progress of copying process. It shows nice bar where first 800Mb goes really fast to the disk before it freezes, it is normal behaviour (at some point) because it waits for data to be really written to the disk (sync). During progress it calculates the average speed. After this 800Mb Midnight commander is unresponsive and average speed goes as low as some kilobits/s. I can try to stop copying process or shutdown the computer but it takes many minutes (Even tens of) until something is responded from midnight commander.

So, continuing without errors... Yes but with that ~2KB/s speed. There seems to be 10 of those blocked for messages and after that no errors on logs, but this frozage continues until I managed to restart computer and stop that copying process. 

This was before I recreated partition table. 

Currently after my partition table fix I think it is "normal operation" because after both of those "blocked for" messages copying continued with reasonable speeds ~70MB/s. And I didn't face em anymore (7TB copied now) seems to be much slower at the end, but if I have understood correctly it's the feature of this disk.
Comment 98 hatl 2015-12-29 19:25:55 UTC
Unfortunately, using the latest kernel doesn't work for me.

Ubuntu wily, Kernel 4.4-rc7 from Ubuntu mainline ppa:


[Tue Dec 29 20:14:09 2015] -  sd 2:0:0:0: [sdb] tag#24 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[Tue Dec 29 20:14:09 2015] -  sd 2:0:0:0: [sdb] tag#24 CDB: Read(16) 88 00 00 00 00 00 50 68 4a 60 00 00 00 20 00 00
[Tue Dec 29 20:14:09 2015] -  blk_update_request: I/O error, dev sdb, sector 1349012064
[Tue Dec 29 20:14:09 2015] -  BTRFS error (device sdb): bdev /dev/sdb errs: wr 6387463, rd 69213, flush 4398, corrupt 0, gen 0
[Tue Dec 29 20:14:09 2015] -  sd 2:0:0:0: [sdb] tag#25 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[Tue Dec 29 20:14:09 2015] -  sd 2:0:0:0: [sdb] tag#25 CDB: Write(16) 8a 00 00 00 00 00 50 68 4a 60 00 00 00 08 00 00
[Tue Dec 29 20:14:09 2015] -  blk_update_request: I/O error, dev sdb, sector 1349012064
[Tue Dec 29 20:14:09 2015] -  BTRFS error (device sdb): bdev /dev/sdb errs: wr 6387464, rd 69213, flush 4398, corrupt 0, gen 0
[Tue Dec 29 20:14:09 2015] -  sd 2:0:0:0: [sdb] tag#26 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[Tue Dec 29 20:14:09 2015] -  sd 2:0:0:0: [sdb] tag#26 CDB: Read(16) 88 00 00 00 00 00 50 67 73 80 00 00 00 20 00 00
[Tue Dec 29 20:14:09 2015] -  blk_update_request: I/O error, dev sdb, sector 1348957056
[Tue Dec 29 20:14:09 2015] -  BTRFS error (device sdb): bdev /dev/sdb errs: wr 6387464, rd 69214, flush 4398, corrupt 0, gen 0
[Tue Dec 29 20:14:09 2015] -  sd 2:0:0:0: [sdb] tag#27 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[Tue Dec 29 20:14:09 2015] -  sd 2:0:0:0: [sdb] tag#27 CDB: Write(16) 8a 00 00 00 00 00 50 67 73 80 00 00 00 08 00 00
[Tue Dec 29 20:14:09 2015] -  blk_update_request: I/O error, dev sdb, sector 1348957056
[Tue Dec 29 20:14:09 2015] -  BTRFS error (device sdb): bdev /dev/sdb errs: wr 6387465, rd 69214, flush 4398, corrupt 0, gen 0
[Tue Dec 29 20:14:11 2015] -  BTRFS warning (device sdb): lost page write due to IO error on /dev/sdb
[Tue Dec 29 20:14:11 2015] -  BTRFS warning (device sdb): lost page write due to IO error on /dev/sdb
[Tue Dec 29 20:14:11 2015] -  BTRFS warning (device sdb): lost page write due to IO error on /dev/sdb
[Tue Dec 29 20:18:32 2015] -  scsi_io_completion: 934 callbacks suppressed
[Tue Dec 29 20:18:32 2015] -  sd 2:0:0:0: [sdb] tag#4 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[Tue Dec 29 20:18:32 2015] -  sd 2:0:0:0: [sdb] tag#4 CDB: Read(16) 88 00 00 00 00 03 a3 81 2a 00 00 00 00 08 00 00
[Tue Dec 29 20:18:32 2015] -  blk_update_request: 937 callbacks suppressed
[Tue Dec 29 20:18:32 2015] -  blk_update_request: I/O error, dev sdb, sector 15628052992
[Tue Dec 29 20:18:32 2015] -  sd 2:0:0:0: [sdb] tag#5 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[Tue Dec 29 20:18:32 2015] -  sd 2:0:0:0: [sdb] tag#5 CDB: Read(16) 88 00 00 00 00 03 a3 81 2a 00 00 00 00 08 00 00
[Tue Dec 29 20:18:32 2015] -  blk_update_request: I/O error, dev sdb, sector 15628052992
[Tue Dec 29 20:18:32 2015] -  Buffer I/O error on dev sdb, logical block 1953506624, async page read
[Tue Dec 29 20:20:13 2015] -  sd 2:0:0:0: [sdb] tag#7 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[Tue Dec 29 20:20:13 2015] -  sd 2:0:0:0: [sdb] tag#7 CDB: Read(16) 88 00 00 00 00 03 a3 81 2a 00 00 00 00 08 00 00
[Tue Dec 29 20:20:13 2015] -  blk_update_request: I/O error, dev sdb, sector 15628052992
[Tue Dec 29 20:20:13 2015] -  Buffer I/O error on dev sdb, logical block 1953506624, async page read
[Tue Dec 29 20:20:44 2015] -  sd 2:0:0:0: [sdb] tag#9 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[Tue Dec 29 20:20:44 2015] -  sd 2:0:0:0: [sdb] tag#9 CDB: Read(16) 88 00 00 00 00 03 a3 81 2a 00 00 00 00 08 00 00
[Tue Dec 29 20:20:44 2015] -  blk_update_request: I/O error, dev sdb, sector 15628052992
[Tue Dec 29 20:20:44 2015] -  Buffer I/O error on dev sdb, logical block 1953506624, async page read
[Tue Dec 29 20:20:44 2015] -  sd 2:0:0:0: [sdb] tag#10 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[Tue Dec 29 20:20:44 2015] -  sd 2:0:0:0: [sdb] tag#10 CDB: Read(16) 88 00 00 00 00 03 a3 81 2a 00 00 00 00 08 00 00
[Tue Dec 29 20:20:44 2015] -  blk_update_request: I/O error, dev sdb, sector 15628052992
[Tue Dec 29 20:20:44 2015] -  Buffer I/O error on dev sdb, logical block 1953506624, async page read
[Tue Dec 29 20:20:46 2015] -  sd 2:0:0:0: [sdb] tag#12 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[Tue Dec 29 20:20:46 2015] -  sd 2:0:0:0: [sdb] tag#12 CDB: Read(16) 88 00 00 00 00 03 a3 81 2a 00 00 00 00 08 00 00
Comment 99 Thomas DEBESSE 2016-01-05 23:31:05 UTC
Hi, I own those drives too and I'm facing problems too. :-)

I made some tests with different hardware configuration so if you experience some issue too, perhaps this (very long) comment can help you to find a solution to be able to use your hardware.

I bought some of these drives to backup personal data (two 4Tb RAID1). My idea was to regularly backup both RAID arrays on one drive, while older copies are kept on others drives in lockers in some different geographical locations, then regularly swap the drive who receives frequent updates with one of the drives kept in different geographical places.

Like other people in this thread, I want to use the ext4 file system on a LUKS volume. Perhaps it can fool the hard drive internal garbage collection other people talked about if this use case prevents linear writing.

Since I own a e-SATA/USB 3.0 dock, I was innocently planning to use it to sync my data on the frequent backup drive. I've read some people in this thread tried to use these drives this way too.

It's a very bad idea to use a dock with this kind of hard drives and does not work at all. I tried 3.18.0, 3.18.21, 4.3.3 and 4.4.0-rc7 kernels with my hard drive dock without success.

This is my test:

1. plug
2. cryptsetup luksFormat
3. cryptsetup luksOpen
4. mkfs.ext4
5. mount
6. umount
7. cryptsetup luksClose
8. unplug

In two cases (detailed below), I tried to write some files between steps 5. and 6.

For information, that is what the kernel printed in dmesg about my dock when used in USB 3.0 mode:

[ 4336.726082] usb 4-1: new SuperSpeed USB device number 2 using xhci_hcd
[ 4336.740496] usb 4-1: New USB device found, idVendor=174c, idProduct=55aa
[ 4336.740503] usb 4-1: New USB device strings: Mfr=2, Product=3, SerialNumber=1
[ 4336.740508] usb 4-1: Product: ASM1053E
[ 4336.740511] usb 4-1: Manufacturer: asmedia
[ 4336.740515] usb 4-1: SerialNumber: 123456789012
[ 4336.741640] usb 4-1: USB controller 0000:0b:00.0 does not support streams, which are required by the UAS driver.
[ 4336.741648] usb 4-1: Please try an other USB controller if you wish to use UAS.

I've not used any USB 3.1 / UAS dock so I can't tell if it works.

When I plug the hard drive I get:

[ 4337.740121] sd 12:0:0:0: Attached scsi generic sg12 type 0
[ 4337.747901] sd 12:0:0:0: [sdk] Spinning up disk...
[ 4338.750173] ...................ready
[ 4356.971833] sd 12:0:0:0: [sdk] 15628053168 512-byte logical blocks: (8.00 TB/7.27 TiB)
[ 4356.971835] sd 12:0:0:0: [sdk] 4096-byte physical blocks
[ 4356.974430] sd 12:0:0:0: [sdk] Write Protect is off
[ 4356.974433] sd 12:0:0:0: [sdk] Mode Sense: 43 00 00 00
[ 4356.977029] sd 12:0:0:0: [sdk] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 4357.002553] sd 12:0:0:0: [sdk] Attached SCSI disk
[ 5277.631070]  sdk: sdk1

With 4.3 kernels I get that doing a mkfs.ext4:

[ 5756.790494] INFO: task mkfs.ext4:21464 blocked for more than 120 seconds.
[ 5756.790503]       Tainted: G           OE   4.3.3-040303-generic #201512150130
[ 5756.790506] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 5756.790509] mkfs.ext4       D ffff88083ecd6980     0 21464  18542 0x00000000
[ 5756.790517]  ffff8807ab0c3d08 0000000000000086 ffff88081aefaa00 ffff8807719daa00
[ 5756.790523]  ffff8807ab0c4000 ffff8807ab0c3e40 ffff8807ab0c3e38 ffff8807719daa00
[ 5756.790527]  0000000000000000 ffff8807ab0c3d20 ffffffff817e7a53 7fffffffffffffff
[ 5756.790532] Call Trace:
[ 5756.790545]  [<ffffffff817e7a53>] schedule+0x33/0x80
[ 5756.790551]  [<ffffffff817eab19>] schedule_timeout+0x189/0x250
[ 5756.790558]  [<ffffffff8138fe83>] ? __blk_run_queue+0x33/0x40
[ 5756.790562]  [<ffffffff81395e9b>] ? blk_queue_bio+0x33b/0x350
[ 5756.790570]  [<ffffffff817e8473>] wait_for_completion+0xb3/0x140
[ 5756.790576]  [<ffffffff810a7b20>] ? wake_up_q+0x60/0x60
[ 5756.790580]  [<ffffffff8138af8b>] submit_bio_wait+0x6b/0x90
[ 5756.790585]  [<ffffffff81397b7d>] blkdev_issue_flush+0x5d/0x90
[ 5756.790591]  [<ffffffff8123be25>] blkdev_fsync+0x35/0x50
[ 5756.790596]  [<ffffffff81234b8b>] vfs_fsync_range+0x4b/0xb0
[ 5756.790602]  [<ffffffff81003b13>] ? prepare_exit_to_usermode+0xa3/0x100
[ 5756.790607]  [<ffffffff81234c4d>] do_fsync+0x3d/0x70
[ 5756.790611]  [<ffffffff81003bbc>] ? syscall_return_slowpath+0x4c/0x120
[ 5756.790616]  [<ffffffff81234ee0>] SyS_fsync+0x10/0x20
[ 5756.790621]  [<ffffffff817ebab6>] entry_SYSCALL_64_fastpath+0x16/0x75

Some time it also printed:

[ 5896.684697] sd 12:0:0:0: timing out command, waited 360s
[ 5896.684712] blk_update_request: I/O error, dev sdk, sector 0

Increasing the '/sys/block/sdk/device/timeout' value only hide "task blocked for more than N seconds" messages but does not prevent "I/O error" messages to print.

The mkfs.ext4 took 12 minutes to finish. During this time, the hard drive was not available (I was not able to query SMART data for example).

I don't know if that I/O error message means my file system is broken or not.

After the mkfs was complete, I tried to mount the file system, it took 6 minutes. Then I tried to unmount the file system immediately without writing something, it took 35 minutes to unmount.

I've tried that with 3.18.0, 3.18.21, 4.3.3 and 4.4.0-rc7 kernels without success. I got the exact same behavior each time.

During all my tests, these are the process that triggered a "task blocked for more than 120 seconds':

- ext4lazyinit
- jbd2/dm-7-8
- mkfs.ext4
- mount
- pool
- smartctl

One time I tried to write some data on the hard drive (using rsync), it took more than one hour to write around 300Mb.

Then I tried to use my hard drive with the same dock in e-SATA mode (with a 4.3.3 kernel).

The kernel printed that in dmesg on hotplug, formating with LUKS, doing some smart queries, and creating a file system:

[ 9809.935995] ata7: exception Emask 0x10 SAct 0x0 SErr 0x4050000 action 0xe frozen
[ 9809.935999] ata7: irq_stat 0x00000040, connection status changed
[ 9809.936000] ata7: SError: { PHYRdyChg CommWake DevExch }
[ 9809.936004] ata7: hard resetting link
[ 9819.926107] ata7: softreset failed (1st FIS failed)
[ 9819.926111] ata7: hard resetting link
[ 9829.923484] ata7: softreset failed (1st FIS failed)
[ 9829.923494] ata7: hard resetting link
[ 9830.986743] ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 9831.023104] ata7.00: ATA-9: ST8000AS0002-1NA17Z, AR13, max UDMA/133
[ 9831.023112] ata7.00: 15628053168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
[ 9831.024383] ata7.00: configured for UDMA/133
[ 9831.024394] ata7: EH complete
[ 9831.024645] scsi 6:0:0:0: Direct-Access     ATA      ST8000AS0002-1NA AR13 PQ: 0 ANSI: 5
[ 9831.025124] sd 6:0:0:0: [sdl] 15628053168 512-byte logical blocks: (8.00 TB/7.27 TiB)
[ 9831.025133] sd 6:0:0:0: [sdl] 4096-byte physical blocks
[ 9831.025224] sd 6:0:0:0: Attached scsi generic sg12 type 0
[ 9831.025300] sd 6:0:0:0: [sdl] Write Protect is off
[ 9831.025308] sd 6:0:0:0: [sdl] Mode Sense: 00 3a 00 00
[ 9831.025361] sd 6:0:0:0: [sdl] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 9831.081474]  sdl: sdl1
[ 9831.082197] sd 6:0:0:0: [sdl] Attached SCSI disk
[ 9878.059460] ata7.00: exception Emask 0x10 SAct 0x2000 SErr 0x480000 action 0x6 frozen
[ 9878.059464] ata7.00: irq_stat 0x08000000, interface fatal error
[ 9878.059465] ata7: SError: { 10B8B Handshk }
[ 9878.059467] ata7.00: failed command: WRITE FPDMA QUEUED
[ 9878.059470] ata7.00: cmd 61/fa:68:08:10:00/00:00:00:00:00/40 tag 13 ncq 128000 out
                        res 40/00:68:08:10:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
[ 9878.059472] ata7.00: status: { DRDY }
[ 9878.059474] ata7: hard resetting link
[ 9888.052845] ata7: softreset failed (1st FIS failed)
[ 9888.052855] ata7: hard resetting link
[ 9898.046139] ata7: softreset failed (1st FIS failed)
[ 9898.046144] ata7: hard resetting link
[ 9933.026910] ata7: softreset failed (1st FIS failed)
[ 9933.026915] ata7: limiting SATA link speed to 3.0 Gbps
[ 9933.026916] ata7: hard resetting link
[ 9938.027583] ata7: softreset failed (1st FIS failed)
[ 9938.027594] ata7: reset failed, giving up
[ 9938.027598] ata7.00: disabled
[ 9938.027619] ata7: EH complete
[ 9938.027672] sd 6:0:0:0: [sdl] tag#14 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[ 9938.027679] sd 6:0:0:0: [sdl] tag#14 CDB: Write(16) 8a 00 00 00 00 00 00 00 10 08 00 00 00 fa 00 00
[ 9938.027683] blk_update_request: I/O error, dev sdl, sector 4104
[ 9938.027792] program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
[ 9938.027839] program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
[ 9938.027868] program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
[ 9938.031307] sd 6:0:0:0: [sdl] tag#21 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[ 9938.031316] sd 6:0:0:0: [sdl] tag#21 CDB: Read(16) 88 00 00 00 00 03 a3 81 27 80 00 00 00 08 00 00
[ 9938.031321] blk_update_request: I/O error, dev sdl, sector 15628052352
[ 9938.031362] sd 6:0:0:0: [sdl] tag#22 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[ 9938.031366] sd 6:0:0:0: [sdl] tag#22 CDB: Read(16) 88 00 00 00 00 03 a3 81 27 80 00 00 00 08 00 00
[ 9938.031368] blk_update_request: I/O error, dev sdl, sector 15628052352
[ 9938.031371] Buffer I/O error on dev sdl1, logical block 1953506032, async page read
[ 9938.035775] sd 6:0:0:0: [sdl] tag#24 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[ 9938.035788] sd 6:0:0:0: [sdl] tag#24 CDB: Read(16) 88 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00
[ 9938.035795] blk_update_request: I/O error, dev sdl, sector 0
[ 9938.035837] sd 6:0:0:0: [sdl] tag#25 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[ 9938.035845] sd 6:0:0:0: [sdl] tag#25 CDB: Read(16) 88 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00
[ 9938.035851] blk_update_request: I/O error, dev sdl, sector 0
[ 9938.035857] Buffer I/O error on dev sdl, logical block 0, async page read
[ 9938.035942] sd 6:0:0:0: [sdl] tag#26 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[ 9938.035949] sd 6:0:0:0: [sdl] tag#26 CDB: Read(16) 88 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00
[ 9938.035953] blk_update_request: I/O error, dev sdl, sector 0
[ 9938.035958] Buffer I/O error on dev sdl, logical block 0, async page read

The hard drive went completely wrong during the mkfs.ext4 step and that step was not able to finish, I had to force the hard-drive / dock shutdown leaving the hard drive with an incomplete ext4 file system on an opened LUKS volume (very bad).

Then I tried to use the hard drive on an internal SATA III plug. It's not convenient since I want to be able to hotplug it only at backup time, and these strange hard drives seem to not have standard screw hole on the underside (I can only screw on 2 screws instead of 4 on my bays like my other 4Tb Seagate hard drives). But I need to know if these drives works on internal SATA, because if not I bought expensive paperweight instead of hard drives !

I tried it on a 3.18.0 kernel only (my test is not finished), but it works.

The mkfs.ext4 took only seconds. Then I started to rsync my whole file tree (two file systems on two 4Tb RAID1 volumes). It started fastly then around first 100Gb it slowed down. Since that it's continuing without error. It's slow but it works. It took 2Tb of data in 12 hours, so it will take some days to complete the test. It's not a problem to me if it's so slow since next rsync will only update modified files (it will be far far faster next time to only write differences).

The kernel printed some 'task blocked for more than 120 seconds' but I never got any 'I/O error'.

So, it seems to works only when I use an internal SATA III connection. If someone wants to use one of these drive, an external dock is not yet an option. If something can be optimized in the kernel to improve the hard drive management, some improvement must be done in e-SATA and USB code too to be able to use theses drives in external docks. If you plan to use an external dock and can't plug your drive internally (using an hotswap bay if you need hotswap), do not buy these drives, you will lose your money.

Since I need to hotplug my drives easily, I bought a SATA III hotplug tray-less bay that fit in a 5.25" CD enclosure. I've not received it yet but I did not expect any issue and it's probably the only current solution for people wanting to hotplug/hotunplug these drives. I strongly recommend to buy one bay with an integrated fan since I noticed that these drives can heat a lot !

I found two interesting trayless hard drive hotplug bay to put in a CD enclosure :

* Icy Dock TurboSwap MB171SP-B
[ http://www.icydock.com/goods.php?id=169 ]
+ trayless
+ can get 3.5" hard drives (and 2.5" drives if you need that one day)
+ integrated 80mm fan (probably not very noisy, and you can replace it with a bigger one if you have the space)
+ variable fan speed (front button)
+ poweron/off front button
- underside fan (you need to keep space below, it probably does not work correctly if you stack multiple of them).

* StarTech HSB110SATBK 
[ http://www.startech.com/HDD/Mobile-Racks/525in-Trayless-Hot-Swap-Mobile-Rack-for-35in-SATA-HDD-with-LCD-and-Fan~HSB110SATBK ]
+ trayless
+ can get 3.5" hard drives
+ integrated 40mm bottom fan
+ you can stack multiple one (bottom fan)
+ screen displaying hard drive temperature and fan speed
+ can adapt automatically fan speed according to hard drive temperature (plus some button to manage that) and buzz some alarm in case of big problem
- the fan is probably very noisy (very small, like the one you find in rackable servers)
- this bay costs twice the price (compared to the other one)

I've bought the first one and I will receive it soon. If you have bought some of these 8Tb Archive hard drives and want to be able to hotplug them, it's probably the only solution available to you currently. I chose the first one since I only need to power it at backup time, and I can just put the fan to max speed during this time, and monitoring hard drive temperature using SMART. If I got some issues I will tell you.

I hope the kernel will receive updates soon to speed-up the writing, but with that kind of bay, you can already hotplug your hard drive today. You'll probably have to wait more more longer to be able to use them on some USB 3/e-SATA docks which is not an option at all today.

After the end of my current test (in some days), I will try to reproduce it on a blank disk using a newer kernel.
Comment 100 shan 2016-01-06 03:04:57 UTC
Sounds just like my case.

I managed to get my drive to somehow working state after creating proper partition table. Fdisk and parted made partitions which always gave warning about alignment? "Warning: The resulting partition is not properly aligned for best performance.", and always performance was terrible, and that unmounting took ages.

After I created partition table with gparted, alignment was ok and drive worked quite well,
  (parted) align-check opt 1                                                
  1 aligned

You can check the above command with qparted and see is it aligned.
Comment 101 Thomas DEBESSE 2016-01-06 04:04:12 UTC
I created my partition with alignment in mind :)

root@box:~# cat /sys/block/sdk/queue/physical_block_size
4096
root@box:~# cat /sys/block/sdk/queue/optimal_io_size 
0
root@box:~# cat /sys/block/sdk/queue/minimum_io_size 
4096
root@box:~# cat /sys/block/sdk/alignment_offset 
0

root@box:~# parted /dev/sdk
GNU Parted 3.2
Using /dev/sdk
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) mklabel gpt
(parted) mkpart primary 4096s 100%
(parted) align-check optimal 1
1 aligned
(parted) unit TB
(parted) print
Model: ATA ST8000AS0002-1NA (scsi)
Disk /dev/sdk: 8,00TB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags: 

Number  Start   End     Size    File system  Name     Flags
 1      0,00TB  8,00TB  8,00TB               primary


I finally discovered something wrong in my computer case that can prevent me to fit correctly the Icybox bay so I canceled the Icybox order I did to buy the Startech one. I hope it's OK since the Startech one is an older product that was designed years before 8Tb drives exist (and on his side, Icybox officially claims 8Tb drive support), and since I already found incompatible stuff with these 8Tb Seagate drives (screw holes incompatible with my computer case internal trays), I'm a bit apprehensive.

On his side, the drive already wrote 3Tb without any errors, with a rough average 50MB/s writing rate, which is the rate Henk Slager had with a 4.3 kernel patched with Martin K. Petersen changes. On my side I'm still running a 3.18 kernel, I will try a 4.4-rc7 one soon.
Comment 102 Christoph Anton Mitterer 2016-01-06 06:25:07 UTC
(More senior members, correct me if I'm wrong with the following)

But guys,... this bug isn't about block layer alignment optimisations, is it?
If one doesn't align the blocks properly, one likely gets some warnings and bad performance. There are many resources out there how to do it right... and most current tools will do everything right out of the box (by auto-detection) or simply fall back to some smart alignment values (e.g. 2048S) which also work out-of-the box for most cases.

What this bug, AFAIU, is about: actual data corruptions, block errors, that happen on these SMR devices (typically the 8TB one from Seagate).

So if you still see block errors / corruptions like the original posters did - even though you have the patch in place - just post.
If you see other corruption errors or completely unexpected performance issues, which may be related to SMR... open a new issue, reference it here.

But we probably don't need message from everyone who ever sees the typical "process blocked for 120s" message.

Best,
:-)
Comment 103 Thomas DEBESSE 2016-01-06 06:53:11 UTC
Well, my comment shows one use case with I/O errors while creating a file-system on the drive and one use case where the drive go wrong for real, needing an hardware shutdown of the drive while the file-system is mounted.

I wrote my comment because I read some messages above about people playing with these drives with external (i.e. not internal SATA) link, and I think it was useful to assess the situation. The same way we probably don't need "process blocked for 120s" messages by everyone, we probably don't need reports on non-SATA use cases. My intent was to point the fact that until theses drives work correctly on SATA link, it's pointless to try to get them working with other links through various interface. It seems clear that the fixes to get them working over SATA are not sufficient to get them working over something else, so if someone wants to try some new kernel or patch, he must know his test is only valid over SATA cable, otherwise he can mark out as faulty a patch that works.

Also, I added (and yes it was free and it fixes nothing upstream) a comment to help other people who can't use their drives today for the usage they bought them for (currently, since I can't use my drives for what I've bought them for, my drives are just expensive paperweights). While waiting for fixes being mainlined, it's good to share functional workarounds. It looks like some people have expensive paperweights since 10 months.

So to quote yourself:

> So either the difference is btrfs, or something in Debian...
> or perhaps that I'm always running these devices via a USB/SATA
> bridge (with USB3.0 though)

Yes, using an USB/SATA bridge adds some other issues that can hide fixes, it's good to know this use case it's not a valid test (proving that is the main purpose of my comment, which is precious).
Comment 104 *cJ* 2016-01-06 07:14:44 UTC
Hi,

Here I have one 8 TB in SATA and another one that's the external version, which is UAS USB3.0.
My use case is backup and archival (rsync, attic/borg backup, snapraid).
Filesystem is btrfs, and we also have an AES-encrypted DM in the mix.

The one thing I can tell for now, is that running 4.4.0-rc8 I still get errors such as:

UAS:

[ 4554.977414] sd 18:0:0:0: [sdt] tag#18 data cmplt err -71 uas-tag 20 inflight: CMD 
[ 4554.978207] sd 18:0:0:0: [sdt] tag#18 CDB: Write(10) 2a 00 50 28 d3 c5 00 00 80 00
[ 4554.982406] xhci_hcd 0000:00:14.0: ERROR Transfer event for disabled endpoint or incorrect stream ring
[ 4554.983193] xhci_hcd 0000:00:14.0: @00000000d89916f0 00000000 00000000 04000000 0b048001

SATA, infinite IO to offline device (btrfs issue?):

Jan 06 00:16:05 Bidule systemd-journald[28106]: Missed 8 kernel messages
Jan 06 00:16:05 Bidule kernel: sd 14:0:0:7: rejecting I/O to offline device
Jan 06 00:16:05 Bidule systemd-journald[28106]: Missed 9 kernel messages
Jan 06 00:16:05 Bidule kernel: sd 14:0:0:7: rejecting I/O to offline device

On top of the usual stuff which has been reported earlier.


And I've always been able to perform small transactions (which leads me to think that the mkfs issues could be unrelated); all the trouble I have is whenever I do lots of writes (dozens / hundrds of GB) on the devices.

So far I haven't been able to fill one of these in a single shot, except when throttling the write speed to about 20 MB/s, which would give time to the disks to do their garbage collection.
Comment 105 Christoph Anton Mitterer 2016-01-06 07:22:44 UTC
@Thomas DEBESSE: As I've said, if people still actually see corruptions I have absoltely nothing against them posting. To the contrary, I'm happy that people actively test and report.

I rather meant the the growing number of "it's just slow when alignment doesn't fit reports",... and even to those people I didn't mean it offensive,.. it just easily clutters up this bug more (as my own two messages now as well ^^) and makes it less likely that the actual issues get fixed.

:-)
Comment 106 sweeneygj 2016-01-06 16:52:03 UTC
(In reply to Thomas DEBESSE from comment #99)
>
> It's a very bad idea to use a dock with this kind of hard drives and does
> not work at all. I tried 3.18.0, 3.18.21, 4.3.3 and 4.4.0-rc7 kernels with
> my hard drive dock without success.

IIRC 4.4.0-rc3 is the first fully patched kernel with respect to this bug report.
That is, it contains the reversion comment #21 and patch comment #67.

Prior to this, 3.18.21 is reported ok in comment #20.

Between 3.18.21 and 4.4.0-rc3 you should expect bugs.

Edgestore DAS200 via e-Sata and ICY DOCK MB971SP-B working fine using m/b and drive in comment #16.
Comment 107 Futuref 2016-01-07 13:15:51 UTC
(In reply to sweeneygj from comment #106)
> (In reply to Thomas DEBESSE from comment #99)
> >
> > It's a very bad idea to use a dock with this kind of hard drives and does
> > not work at all. I tried 3.18.0, 3.18.21, 4.3.3 and 4.4.0-rc7 kernels with
> > my hard drive dock without success.
> 
> IIRC 4.4.0-rc3 is the first fully patched kernel with respect to this bug
> report.
> That is, it contains the reversion comment #21 and patch comment #67.
> 
> Prior to this, 3.18.21 is reported ok in comment #20.
> 
> Between 3.18.21 and 4.4.0-rc3 you should expect bugs.
> 
> Edgestore DAS200 via e-Sata and ICY DOCK MB971SP-B working fine using m/b
> and drive in comment #16.

Thats correct, I am using this drive quite successfully nowadays in the meaning that I am able to write many gigabytes of data on it (Ubuntu Server 15.10, Kernel 4.2 over SATA). But still I have a bad feeling about some weird "click" and "clack" noises of the drive while active.

Might these noises have to do with this bug ticket? Like powerstate is corrupted or prevented? Or other issue in parking the head? The noises are quite loud...
Comment 108 alex.wellmann 2016-01-15 00:16:01 UTC
I have Upgraded to 4.4.0 running Ubuntu server 15.10 but I am still receiving errors. I copied a couple hundred gigs from one 8tb drive to the other (mirror) and received the errors also with the final 4.4.0 release (Ubuntu mainline kernel) 

> [2205462.195961] ata4.00: exception Emask 0x0 SAct 0x7fffffff SErr 0x40000
> action 0x6 frozen
> [2205462.196016] ata4: SError: { CommWake }
> [2205462.196041] ata4.00: failed command: WRITE FPDMA QUEUED
> [2205462.196077] ata4.00: cmd 61/00:00:c0:17:7a/15:00:65:03:00/40 tag 0 ncq
> 2752512 out
> [2205462.196169] ata4.00: status: { DRDY }
> [2205462.196192] ata4.00: failed command: WRITE FPDMA QUEUED
> [2205462.196226] ata4.00: cmd 61/c0:08:c0:2c:7a/0f:00:65:03:00/40 tag 1 ncq
> 2064384 out
> [2205462.196325] ata4.00: status: { DRDY }
> [2205462.196353] ata4.00: failed command: WRITE FPDMA QUEUED
> [2205462.196387] ata4.00: cmd 61/c0:10:80:3c:7a/10:00:65:03:00/40 tag 2 ncq
> 2195456 out
> [2205462.196478] ata4.00: status: { DRDY }
> [2205462.196501] ata4.00: failed command: WRITE FPDMA QUEUED
> [2205462.196545] ata4.00: cmd 61/40:18:40:4d:7a/1a:00:65:03:00/40 tag 3 ncq
> 3440640 out
> [2205462.196641] ata4.00: status: { DRDY }
> [2205462.196664] ata4.00: failed command: WRITE FPDMA QUEUED
> [2205462.196699] ata4.00: cmd 61/c0:20:80:67:7a/0f:00:65:03:00/40 tag 4 ncq
> 2064384 out
> [2205462.196798] ata4.00: status: { DRDY }
> [2205462.196821] ata4.00: failed command: WRITE FPDMA QUEUED
> [2205462.196856] ata4.00: cmd 61/c0:28:40:77:7a/10:00:65:03:00/40 tag 5 ncq
> 2195456 out
> [2205462.196959] ata4.00: status: { DRDY }
> [2205462.196982] ata4.00: failed command: WRITE FPDMA QUEUED
> [2205462.197015] ata4.00: cmd 61/40:30:00:88:7a/05:00:65:03:00/40 tag 6 ncq
> 688128 out
> [2205462.197111] ata4.00: status: { DRDY }
> [2205462.197133] ata4.00: failed command: WRITE FPDMA QUEUED
> [2205462.197167] ata4.00: cmd 61/c0:38:40:8d:7a/0f:00:65:03:00/40 tag 7 ncq
> 2064384 out
> [2205462.197265] ata4.00: status: { DRDY }
> [2205462.198705] ata4.00: failed command: WRITE FPDMA QUEUED
> [2205462.200183] ata4.00: cmd 61/c0:40:00:9d:7a/0f:00:65:03:00/40 tag 8 ncq
> 2064384 out
> [2205462.203220] ata4.00: status: { DRDY }
> [2205462.204781] ata4.00: failed command: WRITE FPDMA QUEUED
> [2205462.206351] ata4.00: cmd 61/00:48:c0:ac:7a/15:00:65:03:00/40 tag 9 ncq
> 2752512 out
> [2205462.209303] ata4.00: status: { DRDY }
> [2205462.210753] ata4.00: failed command: WRITE FPDMA QUEUED
> [2205462.212222] ata4.00: cmd 61/c0:50:c0:c1:7a/10:00:65:03:00/40 tag 10 ncq
> 2195456 out
> [2205462.215086] ata4.00: status: { DRDY }
> [2205462.216540] ata4.00: failed command: WRITE FPDMA QUEUED
> [2205462.217948] ata4.00: cmd 61/80:58:80:d2:7a/0a:00:65:03:00/40 tag 11 ncq
> 1376256 out
> [2205462.220719] ata4.00: status: { DRDY }
> [2205462.222121] ata4.00: failed command: WRITE FPDMA QUEUED
> [2205462.223530] ata4.00: cmd 61/40:60:00:dd:7a/1a:00:65:03:00/40 tag 12 ncq
> 3440640 out
> [2205462.226338] ata4.00: status: { DRDY }
> [2205462.227740] ata4.00: failed command: WRITE FPDMA QUEUED
> [2205462.229192] ata4.00: cmd 61/00:68:40:f7:7a/16:00:65:03:00/40 tag 13 ncq
> 2883584 out
> [2205462.232029] ata4.00: status: { DRDY }
> [2205462.233456] ata4.00: failed command: WRITE FPDMA QUEUED
> [2205462.234845] ata4.00: cmd 61/80:70:40:0d:7b/0a:00:65:03:00/40 tag 14 ncq
> 1376256 out
> [2205462.237650] ata4.00: status: { DRDY }
> [2205462.239061] ata4.00: failed command: WRITE FPDMA QUEUED
> [2205462.240490] ata4.00: cmd 61/80:78:c0:17:7b/0a:00:65:03:00/40 tag 15 ncq
> 1376256 out
> [2205462.243305] ata4.00: status: { DRDY }
> [2205462.244720] ata4.00: failed command: WRITE FPDMA QUEUED
> [2205462.246136] ata4.00: cmd 61/c0:80:40:22:7b/0f:00:65:03:00/40 tag 16 ncq
> 2064384 out
> [2205462.248960] ata4.00: status: { DRDY }
> [2205462.250386] ata4.00: failed command: WRITE FPDMA QUEUED
> [2205462.251807] ata4.00: cmd 61/80:88:00:32:7b/0a:00:65:03:00/40 tag 17 ncq
> 1376256 out
> [2205462.254671] ata4.00: status: { DRDY }
> [2205462.256134] ata4.00: failed command: WRITE FPDMA QUEUED
> [2205462.257559] ata4.00: cmd 61/40:90:80:3c:7b/1b:00:65:03:00/40 tag 18 ncq
> 3571712 out
> [2205462.260402] ata4.00: status: { DRDY }
> [2205462.261860] ata4.00: failed command: WRITE FPDMA QUEUED
> [2205462.263270] ata4.00: cmd 61/80:98:c0:57:7b/1f:00:65:03:00/40 tag 19 ncq
> 4128768 out
> [2205462.266156] ata4.00: status: { DRDY }
> [2205462.267611] ata4.00: failed command: WRITE FPDMA QUEUED
> [2205462.269098] ata4.00: cmd 61/c0:a0:40:77:7b/10:00:65:03:00/40 tag 20 ncq
> 2195456 out
> [2205462.271996] ata4.00: status: { DRDY }
> [2205462.273464] ata4.00: failed command: WRITE FPDMA QUEUED
> [2205462.274903] ata4.00: cmd 61/80:a8:00:88:7b/1f:00:65:03:00/40 tag 21 ncq
> 4128768 out
> [2205462.277773] ata4.00: status: { DRDY }
> [2205462.279221] ata4.00: failed command: WRITE FPDMA QUEUED
> [2205462.280689] ata4.00: cmd 61/00:b0:80:a7:7b/15:00:65:03:00/40 tag 22 ncq
> 2752512 out
> [2205462.283596] ata4.00: status: { DRDY }
> [2205462.285105] ata4.00: failed command: WRITE FPDMA QUEUED
> [2205462.286549] ata4.00: cmd 61/40:b8:80:bc:7b/1b:00:65:03:00/40 tag 23 ncq
> 3571712 out
> [2205462.289449] ata4.00: status: { DRDY }
> [2205462.290890] ata4.00: failed command: WRITE FPDMA QUEUED
> [2205462.292356] ata4.00: cmd 61/00:c0:c0:d7:7b/15:00:65:03:00/40 tag 24 ncq
> 2752512 out
> [2205462.295230] ata4.00: status: { DRDY }
> [2205462.296700] ata4.00: failed command: WRITE FPDMA QUEUED
> [2205462.298160] ata4.00: cmd 61/c0:c8:c0:ec:7b/0f:00:65:03:00/40 tag 25 ncq
> 2064384 out
> [2205462.301025] ata4.00: status: { DRDY }
> [2205462.302466] ata4.00: failed command: WRITE FPDMA QUEUED
> [2205462.303945] ata4.00: cmd 61/80:d0:80:fc:7b/0a:00:65:03:00/40 tag 26 ncq
> 1376256 out
> [2205462.306827] ata4.00: status: { DRDY }
> [2205462.308316] ata4.00: failed command: WRITE FPDMA QUEUED
> [2205462.309749] ata4.00: cmd 61/00:d8:00:07:7c/01:00:65:03:00/40 tag 27 ncq
> 131072 out
> [2205462.312637] ata4.00: status: { DRDY }
> [2205462.314075] ata4.00: failed command: READ FPDMA QUEUED
> [2205462.315488] ata4.00: cmd 60/08:e0:78:08:40/00:00:65:03:00/40 tag 28 ncq
> 4096 in
> [2205462.318365] ata4.00: status: { DRDY }
> [2205462.319800] ata4.00: failed command: WRITE FPDMA QUEUED
> [2205462.321265] ata4.00: cmd 61/00:e8:80:bc:79/2b:00:65:03:00/40 tag 29 ncq
> 5636096 out
> [2205462.324201] ata4.00: status: { DRDY }
> [2205462.325677] ata4.00: failed command: WRITE FPDMA QUEUED
> [2205462.327085] ata4.00: cmd 61/40:f0:80:e7:79/30:00:65:03:00/40 tag 30 ncq
> 6324224 out
> [2205462.329936] ata4.00: status: { DRDY }
> [2205462.331377] ata4: hard resetting link
> [2205462.648331] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> [2205462.999696] ata4.00: configured for UDMA/133
> [2205463.012736] ata4.00: device reported invalid CHS sector 0
> [2205463.012739] ata4.00: device reported invalid CHS sector 0
> [2205463.012740] ata4.00: device reported invalid CHS sector 0
> [2205463.012742] ata4.00: device reported invalid CHS sector 0
> [2205463.012744] ata4.00: device reported invalid CHS sector 0
> [2205463.012746] ata4.00: device reported invalid CHS sector 0
> [2205463.012748] ata4.00: device reported invalid CHS sector 0
> [2205463.012750] ata4.00: device reported invalid CHS sector 0
> [2205463.012752] ata4.00: device reported invalid CHS sector 0
> [2205463.012755] ata4.00: device reported invalid CHS sector 0
> [2205463.012757] ata4.00: device reported invalid CHS sector 0
> [2205463.012758] ata4.00: device reported invalid CHS sector 0
> [2205463.012760] ata4.00: device reported invalid CHS sector 0
> [2205463.012762] ata4.00: device reported invalid CHS sector 0
> [2205463.012764] ata4.00: device reported invalid CHS sector 0
> [2205463.012765] ata4.00: device reported invalid CHS sector 0
> [2205463.012767] ata4.00: device reported invalid CHS sector 0
> [2205463.012769] ata4.00: device reported invalid CHS sector 0
> [2205463.012770] ata4.00: device reported invalid CHS sector 0
> [2205463.012771] ata4.00: device reported invalid CHS sector 0
> [2205463.012773] ata4.00: device reported invalid CHS sector 0
> [2205463.012775] ata4.00: device reported invalid CHS sector 0
> [2205463.012776] ata4.00: device reported invalid CHS sector 0
> [2205463.012777] ata4.00: device reported invalid CHS sector 0
> [2205463.012779] ata4.00: device reported invalid CHS sector 0
> [2205463.012780] ata4.00: device reported invalid CHS sector 0
> [2205463.012791] ata4: EH complete
> [2290840.293930] ata3.00: exception Emask 0x0 SAct 0x6fffffff SErr 0x40000
> action 0x6 frozen
> [2290840.295458] ata3: SError: { CommWake }
> [2290840.296948] ata3.00: failed command: WRITE FPDMA QUEUED
> [2290840.298488] ata3.00: cmd 61/40:00:40:b7:09/05:00:47:03:00/40 tag 0 ncq
> 688128 out
> [2290840.301335] ata3.00: status: { DRDY }
> [2290840.302756] ata3.00: failed command: WRITE FPDMA QUEUED
> [2290840.304163] ata3.00: cmd 61/40:08:80:bc:09/05:00:47:03:00/40 tag 1 ncq
> 688128 out
> [2290840.307015] ata3.00: status: { DRDY }
> [2290840.308423] ata3.00: failed command: WRITE FPDMA QUEUED
> [2290840.309852] ata3.00: cmd 61/40:10:c0:c1:09/06:00:47:03:00/40 tag 2 ncq
> 819200 out
> [2290840.312684] ata3.00: status: { DRDY }
> [2290840.314093] ata3.00: failed command: WRITE FPDMA QUEUED
> [2290840.315493] ata3.00: cmd 61/40:18:00:c8:09/05:00:47:03:00/40 tag 3 ncq
> 688128 out
> [2290840.318324] ata3.00: status: { DRDY }
> [2290840.319725] ata3.00: failed command: WRITE FPDMA QUEUED
> [2290840.321114] ata3.00: cmd 61/40:20:40:cd:09/05:00:47:03:00/40 tag 4 ncq
> 688128 out
> [2290840.323937] ata3.00: status: { DRDY }
> [2290840.325347] ata3.00: failed command: WRITE FPDMA QUEUED
> [2290840.326738] ata3.00: cmd 61/40:28:80:d2:09/05:00:47:03:00/40 tag 5 ncq
> 688128 out
> [2290840.329543] ata3.00: status: { DRDY }
> [2290840.330936] ata3.00: failed command: WRITE FPDMA QUEUED
> [2290840.332318] ata3.00: cmd 61/40:30:c0:d7:09/05:00:47:03:00/40 tag 6 ncq
> 688128 out
> [2290840.335121] ata3.00: status: { DRDY }
> [2290840.336522] ata3.00: failed command: WRITE FPDMA QUEUED
> [2290840.337906] ata3.00: cmd 61/40:38:00:dd:09/05:00:47:03:00/40 tag 7 ncq
> 688128 out
> [2290840.340697] ata3.00: status: { DRDY }
> [2290840.342086] ata3.00: failed command: WRITE FPDMA QUEUED
> [2290840.343460] ata3.00: cmd 61/40:40:40:e2:09/05:00:47:03:00/40 tag 8 ncq
> 688128 out
> [2290840.346251] ata3.00: status: { DRDY }
> [2290840.347646] ata3.00: failed command: WRITE FPDMA QUEUED
> [2290840.349021] ata3.00: cmd 61/40:48:80:e7:09/05:00:47:03:00/40 tag 9 ncq
> 688128 out
> [2290840.351810] ata3.00: status: { DRDY }
> [2290840.353190] ata3.00: failed command: WRITE FPDMA QUEUED
> [2290840.354567] ata3.00: cmd 61/40:50:c0:ec:09/05:00:47:03:00/40 tag 10 ncq
> 688128 out
> [2290840.357348] ata3.00: status: { DRDY }
> [2290840.358728] ata3.00: failed command: WRITE FPDMA QUEUED
> [2290840.360116] ata3.00: cmd 61/40:58:00:f2:09/05:00:47:03:00/40 tag 11 ncq
> 688128 out
> [2290840.362871] ata3.00: status: { DRDY }
> [2290840.364263] ata3.00: failed command: WRITE FPDMA QUEUED
> [2290840.365643] ata3.00: cmd 61/40:60:40:f7:09/05:00:47:03:00/40 tag 12 ncq
> 688128 out
> [2290840.368400] ata3.00: status: { DRDY }
> [2290840.369797] ata3.00: failed command: WRITE FPDMA QUEUED
> [2290840.371157] ata3.00: cmd 61/80:68:80:fc:09/0b:00:47:03:00/40 tag 13 ncq
> 1507328 out
> [2290840.373975] ata3.00: status: { DRDY }
> [2290840.375415] ata3.00: failed command: WRITE FPDMA QUEUED
> [2290840.376774] ata3.00: cmd 61/80:70:00:08:0a/0a:00:47:03:00/40 tag 14 ncq
> 1376256 out
> [2290840.379541] ata3.00: status: { DRDY }
> [2290840.380956] ata3.00: failed command: WRITE FPDMA QUEUED
> [2290840.382363] ata3.00: cmd 61/80:78:80:12:0a/0a:00:47:03:00/40 tag 15 ncq
> 1376256 out
> [2290840.385189] ata3.00: status: { DRDY }
> [2290840.386562] ata3.00: failed command: WRITE FPDMA QUEUED
> [2290840.387974] ata3.00: cmd 61/40:80:00:1d:0a/05:00:47:03:00/40 tag 16 ncq
> 688128 out
> [2290840.390772] ata3.00: status: { DRDY }
> [2290840.392185] ata3.00: failed command: WRITE FPDMA QUEUED
> [2290840.393557] ata3.00: cmd 61/40:88:40:22:0a/05:00:47:03:00/40 tag 17 ncq
> 688128 out
> [2290840.396328] ata3.00: status: { DRDY }
> [2290840.397751] ata3.00: failed command: WRITE FPDMA QUEUED
> [2290840.399167] ata3.00: cmd 61/40:90:80:27:0a/05:00:47:03:00/40 tag 18 ncq
> 688128 out
> [2290840.401975] ata3.00: status: { DRDY }
> [2290840.403419] ata3.00: failed command: WRITE FPDMA QUEUED
> [2290840.404791] ata3.00: cmd 61/40:98:c0:2c:0a/05:00:47:03:00/40 tag 19 ncq
> 688128 out
> [2290840.407626] ata3.00: status: { DRDY }
> [2290840.409048] ata3.00: failed command: WRITE FPDMA QUEUED
> [2290840.410425] ata3.00: cmd 61/40:a0:00:32:0a/05:00:47:03:00/40 tag 20 ncq
> 688128 out
> [2290840.413253] ata3.00: status: { DRDY }
> [2290840.414635] ata3.00: failed command: WRITE FPDMA QUEUED
> [2290840.416067] ata3.00: cmd 61/40:a8:40:37:0a/05:00:47:03:00/40 tag 21 ncq
> 688128 out
> [2290840.418855] ata3.00: status: { DRDY }
> [2290840.420279] ata3.00: failed command: WRITE FPDMA QUEUED
> [2290840.421657] ata3.00: cmd 61/40:b0:80:3c:0a/05:00:47:03:00/40 tag 22 ncq
> 688128 out
> [2290840.424496] ata3.00: status: { DRDY }
> [2290840.425938] ata3.00: failed command: WRITE FPDMA QUEUED
> [2290840.427335] ata3.00: cmd 61/40:b8:c0:41:0a/05:00:47:03:00/40 tag 23 ncq
> 688128 out
> [2290840.430123] ata3.00: status: { DRDY }
> [2290840.431563] ata3.00: failed command: WRITE FPDMA QUEUED
> [2290840.432942] ata3.00: cmd 61/00:c0:00:47:0a/01:00:47:03:00/40 tag 24 ncq
> 131072 out
> [2290840.435743] ata3.00: status: { DRDY }
> [2290840.437165] ata3.00: failed command: WRITE FPDMA QUEUED
> [2290840.438545] ata3.00: cmd 61/40:c8:00:48:0a/05:00:47:03:00/40 tag 25 ncq
> 688128 out
> [2290840.441395] ata3.00: status: { DRDY }
> [2290840.442777] ata3.00: failed command: WRITE FPDMA QUEUED
> [2290840.444219] ata3.00: cmd 61/40:d0:40:4d:0a/05:00:47:03:00/40 tag 26 ncq
> 688128 out
> [2290840.447036] ata3.00: status: { DRDY }
> [2290840.448467] ata3.00: failed command: WRITE FPDMA QUEUED
> [2290840.449870] ata3.00: cmd 61/40:d8:80:52:0a/05:00:47:03:00/40 tag 27 ncq
> 688128 out
> [2290840.452655] ata3.00: status: { DRDY }
> [2290840.454115] ata3.00: failed command: WRITE FPDMA QUEUED
> [2290840.455548] ata3.00: cmd 61/00:e8:40:4d:09/40:00:47:03:00/40 tag 29 ncq
> 8388608 out
> [2290840.458333] ata3.00: status: { DRDY }
> [2290840.459779] ata3.00: failed command: WRITE FPDMA QUEUED
> [2290840.461156] ata3.00: cmd 61/00:f0:40:8d:09/2a:00:47:03:00/40 tag 30 ncq
> 5505024 out
> [2290840.463967] ata3.00: status: { DRDY }
> [2290840.465392] ata3: hard resetting link
> [2290840.782226] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> [2290841.634966] ata3.00: configured for UDMA/133
> [2290841.651034] ata3.00: device reported invalid CHS sector 0
> [2290841.651038] ata3.00: device reported invalid CHS sector 0
> [2290841.651040] ata3.00: device reported invalid CHS sector 0
> [2290841.651042] ata3.00: device reported invalid CHS sector 0
> [2290841.651044] ata3.00: device reported invalid CHS sector 0
> [2290841.651045] ata3.00: device reported invalid CHS sector 0
> [2290841.651047] ata3.00: device reported invalid CHS sector 0
> [2290841.651049] ata3.00: device reported invalid CHS sector 0
> [2290841.651051] ata3.00: device reported invalid CHS sector 0
> [2290841.651053] ata3.00: device reported invalid CHS sector 0
> [2290841.651054] ata3.00: device reported invalid CHS sector 0
> [2290841.651056] ata3.00: device reported invalid CHS sector 0
> [2290841.651057] ata3.00: device reported invalid CHS sector 0
> [2290841.651058] ata3.00: device reported invalid CHS sector 0
> [2290841.651060] ata3.00: device reported invalid CHS sector 0
> [2290841.651062] ata3.00: device reported invalid CHS sector 0
> [2290841.651063] ata3.00: device reported invalid CHS sector 0
> [2290841.651065] ata3.00: device reported invalid CHS sector 0
> [2290841.651067] ata3.00: device reported invalid CHS sector 0
> [2290841.651068] ata3.00: device reported invalid CHS sector 0
> [2290841.651070] ata3.00: device reported invalid CHS sector 0
> [2290841.651071] ata3.00: device reported invalid CHS sector 0
> [2290841.651073] ata3.00: device reported invalid CHS sector 0
> [2290841.651075] ata3.00: device reported invalid CHS sector 0
> [2290841.651076] ata3.00: device reported invalid CHS sector 0
> [2290841.651078] ata3.00: device reported invalid CHS sector 0
> [2290841.651079] ata3.00: device reported invalid CHS sector 0
> [2290841.651081] ata3.00: device reported invalid CHS sector 0
> [2290841.651083] ata3.00: device reported invalid CHS sector 0
> [2290841.651084] ata3.00: device reported invalid CHS sector 0
> [2290841.651096] ata3: EH complete
> [2291004.456724] ata3.00: exception Emask 0x0 SAct 0x7ff7ffff SErr 0x40000
> action 0x6 frozen
> [2291004.458933] ata3: SError: { CommWake }
> [2291004.461146] ata3.00: failed command: WRITE FPDMA QUEUED
> [2291004.463344] ata3.00: cmd 61/40:00:00:bd:20/05:00:64:03:00/40 tag 0 ncq
> 688128 out
> [2291004.467727] ata3.00: status: { DRDY }
> [2291004.469916] ata3.00: failed command: WRITE FPDMA QUEUED
> [2291004.472076] ata3.00: cmd 61/c0:08:40:c2:20/05:00:64:03:00/40 tag 1 ncq
> 753664 out
> [2291004.476444] ata3.00: status: { DRDY }
> [2291004.478409] ata3.00: failed command: WRITE FPDMA QUEUED
> [2291004.480337] ata3.00: cmd 61/40:10:00:c8:20/05:00:64:03:00/40 tag 2 ncq
> 688128 out
> [2291004.484237] ata3.00: status: { DRDY }
> [2291004.486184] ata3.00: failed command: WRITE FPDMA QUEUED
> [2291004.487943] ata3.00: cmd 61/40:18:40:cd:20/05:00:64:03:00/40 tag 3 ncq
> 688128 out
> [2291004.491464] ata3.00: status: { DRDY }
> [2291004.493217] ata3.00: failed command: WRITE FPDMA QUEUED
> [2291004.494955] ata3.00: cmd 61/40:20:80:d2:20/05:00:64:03:00/40 tag 4 ncq
> 688128 out
> [2291004.498480] ata3.00: status: { DRDY }
> [2291004.500091] ata3.00: failed command: WRITE FPDMA QUEUED
> [2291004.501686] ata3.00: cmd 61/40:28:c0:d7:20/05:00:64:03:00/40 tag 5 ncq
> 688128 out
> [2291004.504897] ata3.00: status: { DRDY }
> [2291004.506492] ata3.00: failed command: WRITE FPDMA QUEUED
> [2291004.508080] ata3.00: cmd 61/40:30:00:dd:20/05:00:64:03:00/40 tag 6 ncq
> 688128 out
> [2291004.511299] ata3.00: status: { DRDY }
> [2291004.512800] ata3.00: failed command: WRITE FPDMA QUEUED
> [2291004.514264] ata3.00: cmd 61/c0:38:40:e2:20/0f:00:64:03:00/40 tag 7 ncq
> 2064384 out
> [2291004.517233] ata3.00: status: { DRDY }
> [2291004.518703] ata3.00: failed command: WRITE FPDMA QUEUED
> [2291004.520162] ata3.00: cmd 61/40:40:00:f2:20/05:00:64:03:00/40 tag 8 ncq
> 688128 out
> [2291004.523131] ata3.00: status: { DRDY }
> [2291004.524538] ata3.00: failed command: WRITE FPDMA QUEUED
> [2291004.525930] ata3.00: cmd 61/40:48:40:f7:20/05:00:64:03:00/40 tag 9 ncq
> 688128 out
> [2291004.528736] ata3.00: status: { DRDY }
> [2291004.530121] ata3.00: failed command: WRITE FPDMA QUEUED
> [2291004.531502] ata3.00: cmd 61/80:50:80:fc:20/0b:00:64:03:00/40 tag 10 ncq
> 1507328 out
> [2291004.534320] ata3.00: status: { DRDY }
> [2291004.535713] ata3.00: failed command: WRITE FPDMA QUEUED
> [2291004.537091] ata3.00: cmd 61/40:58:00:08:21/05:00:64:03:00/40 tag 11 ncq
> 688128 out
> [2291004.539830] ata3.00: status: { DRDY }
> [2291004.541197] ata3.00: failed command: WRITE FPDMA QUEUED
> [2291004.542558] ata3.00: cmd 61/c0:60:40:0d:21/0f:00:64:03:00/40 tag 12 ncq
> 2064384 out
> [2291004.545327] ata3.00: status: { DRDY }
> [2291004.546702] ata3.00: failed command: WRITE FPDMA QUEUED
> [2291004.548065] ata3.00: cmd 61/40:68:00:1d:21/05:00:64:03:00/40 tag 13 ncq
> 688128 out
> [2291004.550826] ata3.00: status: { DRDY }
> [2291004.552195] ata3.00: failed command: WRITE FPDMA QUEUED
> [2291004.553576] ata3.00: cmd 61/40:70:40:22:21/05:00:64:03:00/40 tag 14 ncq
> 688128 out
> [2291004.556337] ata3.00: status: { DRDY }
> [2291004.557716] ata3.00: failed command: WRITE FPDMA QUEUED
> [2291004.559108] ata3.00: cmd 61/40:78:80:27:21/05:00:64:03:00/40 tag 15 ncq
> 688128 out
> [2291004.561879] ata3.00: status: { DRDY }
> [2291004.563254] ata3.00: failed command: WRITE FPDMA QUEUED
> [2291004.564634] ata3.00: cmd 61/40:80:c0:2c:21/05:00:64:03:00/40 tag 16 ncq
> 688128 out
> [2291004.567406] ata3.00: status: { DRDY }
> [2291004.568789] ata3.00: failed command: WRITE FPDMA QUEUED
> [2291004.570184] ata3.00: cmd 61/40:88:00:32:21/05:00:64:03:00/40 tag 17 ncq
> 688128 out
> [2291004.572964] ata3.00: status: { DRDY }
> [2291004.574342] ata3.00: failed command: WRITE FPDMA QUEUED
> [2291004.575718] ata3.00: cmd 61/40:90:40:37:21/05:00:64:03:00/40 tag 18 ncq
> 688128 out
> [2291004.578506] ata3.00: status: { DRDY }
> [2291004.579886] ata3.00: failed command: WRITE FPDMA QUEUED
> [2291004.581273] ata3.00: cmd 61/40:a0:c0:41:20/30:00:64:03:00/40 tag 20 ncq
> 6324224 out
> [2291004.584073] ata3.00: status: { DRDY }
> [2291004.585460] ata3.00: failed command: WRITE FPDMA QUEUED
> [2291004.586841] ata3.00: cmd 61/80:a8:00:72:20/0a:00:64:03:00/40 tag 21 ncq
> 1376256 out
> [2291004.589635] ata3.00: status: { DRDY }
> [2291004.591021] ata3.00: failed command: WRITE FPDMA QUEUED
> [2291004.592405] ata3.00: cmd 61/40:b0:80:7c:20/05:00:64:03:00/40 tag 22 ncq
> 688128 out
> [2291004.595213] ata3.00: status: { DRDY }
> [2291004.596618] ata3.00: failed command: WRITE FPDMA QUEUED
> [2291004.598007] ata3.00: cmd 61/40:b8:c0:81:20/06:00:64:03:00/40 tag 23 ncq
> 819200 out
> [2291004.600806] ata3.00: status: { DRDY }
> [2291004.602191] ata3.00: failed command: WRITE FPDMA QUEUED
> [2291004.603575] ata3.00: cmd 61/40:c0:00:88:20/1a:00:64:03:00/40 tag 24 ncq
> 3440640 out
> [2291004.606378] ata3.00: status: { DRDY }
> [2291004.607783] ata3.00: failed command: WRITE FPDMA QUEUED
> [2291004.609173] ata3.00: cmd 61/c0:c8:40:a2:20/05:00:64:03:00/40 tag 25 ncq
> 753664 out
> [2291004.611969] ata3.00: status: { DRDY }
> [2291004.613360] ata3.00: failed command: WRITE FPDMA QUEUED
> [2291004.614750] ata3.00: cmd 61/18:d0:20:08:00/00:00:64:03:00/40 tag 26 ncq
> 12288 out
> [2291004.617570] ata3.00: status: { DRDY }
> [2291004.618960] ata3.00: failed command: WRITE FPDMA QUEUED
> [2291004.620338] ata3.00: cmd 61/40:d8:00:a8:20/05:00:64:03:00/40 tag 27 ncq
> 688128 out
> [2291004.623125] ata3.00: status: { DRDY }
> [2291004.624555] ata3.00: failed command: WRITE FPDMA QUEUED
> [2291004.625934] ata3.00: cmd 61/40:e0:40:ad:20/05:00:64:03:00/40 tag 28 ncq
> 688128 out
> [2291004.628807] ata3.00: status: { DRDY }
> [2291004.630252] ata3.00: failed command: WRITE FPDMA QUEUED
> [2291004.631628] ata3.00: cmd 61/40:e8:80:b2:20/05:00:64:03:00/40 tag 29 ncq
> 688128 out
> [2291004.634408] ata3.00: status: { DRDY }
> [2291004.635822] ata3.00: failed command: WRITE FPDMA QUEUED
> [2291004.637199] ata3.00: cmd 61/40:f0:c0:b7:20/05:00:64:03:00/40 tag 30 ncq
> 688128 out
> [2291004.640013] ata3.00: status: { DRDY }
> [2291004.641419] ata3: hard resetting link
> [2291004.961025] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> [2291004.963953] ata3.00: configured for UDMA/133
> [2291004.977210] ata3.00: device reported invalid CHS sector 0
> [2291004.977214] ata3.00: device reported invalid CHS sector 0
> [2291004.977215] ata3.00: device reported invalid CHS sector 0
> [2291004.977218] ata3.00: device reported invalid CHS sector 0
> [2291004.977219] ata3.00: device reported invalid CHS sector 0
> [2291004.977221] ata3.00: device reported invalid CHS sector 0
> [2291004.977223] ata3.00: device reported invalid CHS sector 0
> [2291004.977225] ata3.00: device reported invalid CHS sector 0
> [2291004.977227] ata3.00: device reported invalid CHS sector 0
> [2291004.977229] ata3.00: device reported invalid CHS sector 0
> [2291004.977230] ata3.00: device reported invalid CHS sector 0
> [2291004.977232] ata3.00: device reported invalid CHS sector 0
> [2291004.977234] ata3.00: device reported invalid CHS sector 0
> [2291004.977235] ata3.00: device reported invalid CHS sector 0
> [2291004.977237] ata3.00: device reported invalid CHS sector 0
> [2291004.977239] ata3.00: device reported invalid CHS sector 0
> [2291004.977241] ata3.00: device reported invalid CHS sector 0
> [2291004.977242] ata3.00: device reported invalid CHS sector 0
> [2291004.977243] ata3.00: device reported invalid CHS sector 0
> [2291004.977245] ata3.00: device reported invalid CHS sector 0
> [2291004.977247] ata3.00: device reported invalid CHS sector 0
> [2291004.977248] ata3.00: device reported invalid CHS sector 0
> [2291004.977250] ata3.00: device reported invalid CHS sector 0
> [2291004.977252] ata3.00: device reported invalid CHS sector 0
> [2291004.977254] ata3.00: device reported invalid CHS sector 0
> [2291004.977255] ata3.00: device reported invalid CHS sector 0
> [2291004.977258] ata3.00: device reported invalid CHS sector 0
> [2291004.977259] ata3.00: device reported invalid CHS sector 0
> [2291004.977261] ata3.00: device reported invalid CHS sector 0
> [2291004.977273] ata3: EH complete
Comment 109 alex.wellmann 2016-01-15 00:18:40 UTC
...I am using ext4 fs...(In reply to alex.wellmann from comment #108)
Comment 110 Thomas DEBESSE 2016-01-15 01:06:01 UTC
Are you sure your drive is not merely dead?

What is the output of `smartctl -l error /dev/sda` and `smartctl -H /dev/sda` 
(adapt `/dev/sda` with your own drive path)?

Here, these drives work well on 4.4.0 kernel from ubuntu mainline when used on direct SATA link (I never seen them working on USB3/e-SATA links).

Your drive is probably just dead. I don't know if it can be a problem on the SATA chipset side but it really looks like an hardware fault on the hard drive side.
Comment 111 alex.wellmann 2016-01-15 02:06:00 UTC
That's also what i thought (hardware failure) but the drives work fine on Windows without any issues and are Connected via SATA directly to the MB (Intel C226 chipset). Smartctl also shows no issues for both drives:

smartctl 6.4 2014-10-07 r4002 [x86_64-linux-4.4.0-040400-generic] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Error Log Version: 1
No Errors Logged


smartctl 6.4 2014-10-07 r4002 [x86_64-linux-4.4.0-040400-generic] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED


I've also replaced all SATA cables before I knew that this probably software related...
Comment 112 Thomas DEBESSE 2016-01-15 02:17:46 UTC
Oh yes I found similar lines in a Linux 4.3 dmesg I recorded some days before (but I'm ok with 4.4):

```
[ 9878.059465] ata7: SError: { 10B8B Handshk }
[ 9878.059467] ata7.00: failed command: WRITE FPDMA QUEUED
[ 9878.059470] ata7.00: cmd 61/fa:68:08:10:00/00:00:00:00:00/40 tag 13 ncq 128000 out
                        res 40/00:68:08:10:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
[ 9878.059472] ata7.00: status: { DRDY }
```

So it looks software related…
Comment 113 sweeneygj 2016-01-15 14:29:44 UTC
(In reply to alex.wellmann from comment #108)
> I have Upgraded to 4.4.0 running Ubuntu server 15.10 but I am still
> receiving errors. 

Not good news.

> > [2290841.651034] ata3.00: device reported invalid CHS sector 0

See comment #23.

You were running 4.1, comment #72, and now run 4.4.0-040400-generic 

For both SMR drive can you post

smartctl -A /dev/sdX 

Assuming they have a single partition:

parted /dev/sdX align-check optimal 1

Can you shred the 2nd SMR drive with different sizes (10G/100G/500G/...)?

shred -v -s 10G -n 1 

Using 4.4.0-rc4 here with snapraid parity builds and no errors.
Comment 114 alex.wellmann 2016-01-15 22:42:52 UTC
> For both SMR drive can you post smartctl -A /dev/sdX
Main Drive:
smartctl 6.4 2014-10-07 r4002 [x86_64-linux-4.4.0-040400-generic] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   119   099   006    Pre-fail  Always       -       206075680
  3 Spin_Up_Time            0x0003   090   090   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       811
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   078   061   030    Pre-fail  Always       -       69294250
  9 Power_On_Hours          0x0032   094   094   000    Old_age   Always       -       5715
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       169
183 Runtime_Bad_Block       0x0032   098   098   000    Old_age   Always       -       2
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   098   000    Old_age   Always       -       21475164174
189 High_Fly_Writes         0x003a   099   099   000    Old_age   Always       -       1
190 Airflow_Temperature_Cel 0x0022   077   052   045    Old_age   Always       -       23 (Min/Max 19/35)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       159
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       1005
194 Temperature_Celsius     0x0022   023   048   000    Old_age   Always       -       23 (0 19 0 0 0)
195 Hardware_ECC_Recovered  0x001a   119   099   000    Old_age   Always       -       206075680
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       59
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       1407 (192 231 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       62824382504
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       97371030204

Mirror Drive (there seems to be the issue):
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   119   100   006    Pre-fail  Always       -       223354776
  3 Spin_Up_Time            0x0003   092   090   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       502
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   077   060   030    Pre-fail  Always       -       57320876
  9 Power_On_Hours          0x0032   095   095   000    Old_age   Always       -       4553
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       54
183 Runtime_Bad_Block       0x0032   095   095   000    Old_age   Always       -       5
184 End-to-End_Error        0x0032   001   001   099    Old_age   Always   FAILING_NOW 2401
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   091   000    Old_age   Always       -       25770262588
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   076   055   045    Old_age   Always       -       24 (Min/Max 19/34)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       99
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       585
194 Temperature_Celsius     0x0022   024   045   000    Old_age   Always       -       24 (0 19 0 0 0)
195 Hardware_ECC_Recovered  0x001a   119   100   000    Old_age   Always       -       223354776
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       273 (229 40 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       88190076516
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       52861305865

> Assuming they have a single partition: parted /dev/sdX align-check optimal 1

result is for both "1 aligned"

 
> Can you shred the 2nd SMR drive with different sizes (10G/100G/500G/...)?
> shred -v -s 10G -n 1 --> I am shredding on the mirror drive (the one with the
> end-to-end error) now...
Comment 115 Bartłomiej Romański 2016-01-16 00:03:39 UTC
Same issue here. Brand new Samsung Archive 8 TB "fails" after rsyncing ~250GB.

Tested on Arch with fresh 4.4.0-mainline kernel. Single ext4 partition over LUKS. 

Full dmesg in the attached file. Please let me know if you need any extra details.
Comment 116 Bartłomiej Romański 2016-01-16 00:05:32 UTC
Created attachment 199761 [details]
dmesg from boot to drive failure after rsyncing ~250 GB of data
Comment 117 alex.wellmann 2016-01-16 00:13:48 UTC
> Can you shred the 2nd SMR drive with different sizes (10G/100G/500G/...)?

I ran shred with 10G/100G/500G without any issues. I also checked dmesg for errors but it was clean (no issues). In addition I fired up windows and checked with SeaTools for errors (the end-to-end errors made me nervous) but according to the software all is good. I assume that the end-to-end errors are a result of of issues shown in the logs. I use these drives for some time now - so it is not surprising that errors occurred due to the kernel issues (right?)...
Comment 118 Oliver 2016-01-16 12:52:03 UTC
I'm using the 8 TB (AR15) in combination with btrfs on a gentoo 4.3.3 kernel and am experiencing some very strange slowdowns. I have managed to copy 15 TB without trouble and after copy, I like to run rsync -car --progress to verify that the data copied matches the original (bit rot experiences from the past).

In this case, it verified maybe 5 TB but looking at the rsync progress I see speeds slowing down after a few hours. The last few files went around 256 kb/s - 2 mb/s. Even slower then what I expect from the garbage collection progress.

Additionally, I see kworker going mad:
 9012 root      20   0    0,0m   0,0m   0,0m R  99,3  0,0   0,0m 698:42.28 kworker/u8:22           

Looking at the btrfs bits, I see they have been active cumulatively but are 'standing still' for this bit:
 2658 root      20   0    0,0m   0,0m   0,0m S   0,0  0,0   0,0m  44:35.55 btrfs-transacti         

I don't know if this is purely a btrfs thing or the SMR causing havoc in the driver stack.

From what I understood however, the bad performance of the SMR drives was mostly due to writing to the drivers, with the rsync job really only reading, I'm a little puzzled at to what is going on.

dmesg is clean (for reading, when writing i see my LSI controller complaining about delays, but I kind of expected that. After I am able to verify my data is safely copied, I'll give the 4.4 kernel a spin and see if that improves this at all.
Comment 119 Oliver 2016-01-16 15:24:42 UTC
A slight nuance, I accidentally got my rsync commands wrong and was copying as a subdirectory instead of comparing, so the slow speed may be attributed as to the drive being being written full (about 50gb). The rsync job is running with the proper arguments now and will monitor strange CPU loads.
Comment 120 alex.wellmann 2016-01-16 15:34:25 UTC
one update: after shredding on the second drive, I have repartioned it and setup a single partition again in EXT4. now i am copying the files from the first drive using rsync and after 4TB of writes strangely still no errors in dmesg...
Comment 121 Bartłomiej Romański 2016-01-16 17:08:07 UTC
Just tried shreding the drive with 'sudo shred -v -n 1 /dev/sde'.

After ~200 GB shred started reporting errors:

shred: /dev/sde: error writing at offset 228231399936: Input/output error
shred: /dev/sde: error writing at offset 228231400448: Input/output error
shred: /dev/sde: error writing at offset 228231400960: Input/output error
[..]

Dmesg:

[66136.982973] ata2.00: exception Emask 0x0 SAct 0x100000 SErr 0x0 action 0x6 frozen
[66136.982977] ata2.00: failed command: WRITE FPDMA QUEUED
[66136.982980] ata2.00: cmd 61/80:a0:00:8e:88/00:00:1a:00:00/40 tag 20 ncq 65536 out
                        res 40/00:00:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
[66136.982981] ata2.00: status: { DRDY }
[66136.982983] ata2: hard resetting link
[66142.319497] ata2: link is slow to respond, please be patient (ready=0)
[66147.012756] ata2: COMRESET failed (errno=-16)
[66147.012763] ata2: hard resetting link
[66152.345957] ata2: link is slow to respond, please be patient (ready=0)
[66157.042530] ata2: COMRESET failed (errno=-16)
[66157.042535] ata2: hard resetting link
[66162.375757] ata2: link is slow to respond, please be patient (ready=0)
[66192.091844] ata2: COMRESET failed (errno=-16)
[66192.091849] ata2: limiting SATA link speed to 1.5 Gbps
[66192.091851] ata2: hard resetting link
[66197.105067] ata2: COMRESET failed (errno=-16)
[66197.105072] ata2: reset failed, giving up
[66197.105073] ata2.00: disabled
[66197.105084] ata2: EH complete
[66197.105097] scsi_io_completion: 96 callbacks suppressed
[66197.105100] sd 1:0:0:0: [sde] tag#21 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
[66197.105102] sd 1:0:0:0: [sde] tag#21 CDB: opcode=0x8a 8a 00 00 00 00 00 1a 88 8e 00 00 00 00 80 00 00
[66197.105103] blk_update_request: 143 callbacks suppressed
[66197.105104] blk_update_request: I/O error, dev sde, sector 445156864
[66197.105218] sd 1:0:0:0: [sde] tag#22 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
[66197.105221] sd 1:0:0:0: [sde] tag#22 CDB: opcode=0x8a 8a 00 00 00 00 00 1a 88 8e 01 00 00 00 7f 00 00
[66197.105222] blk_update_request: I/O error, dev sde, sector 445156865
[66197.105302] sd 1:0:0:0: [sde] tag#23 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
[66197.105305] sd 1:0:0:0: [sde] tag#23 CDB: opcode=0x8a 8a 00 00 00 00 00 1a 88 8e 02 00 00 00 7e 00 00
[66197.105306] blk_update_request: I/O error, dev sde, sector 445156866
[66197.105374] sd 1:0:0:0: [sde] tag#24 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
[66197.105377] sd 1:0:0:0: [sde] tag#24 CDB: opcode=0x8a 8a 00 00 00 00 00 1a 88 8e 03 00 00 00 7d 00 00
[..]
Comment 122 sweeneygj 2016-01-16 17:25:21 UTC
(In reply to alex.wellmann from comment #108)
> I have Upgraded to 4.4.0 running Ubuntu server 15.10 but I am still
> receiving errors.

[Fedora F23, drive set up with single ext4 partition on top of LUKS]

4.4.0 works for me:

uname -r;ddrescue /dev/sdc image.dd image.log
4.4.0-gjs-archive-
GNU ddrescue 1.20
Press Ctrl-C to interrupt
rescued:     1027 GB,   errsize:         0 B,    current rate:    110 MB/s
   ipos:     1027 GB,    errors:         0,      average rate:  81368 kB/s
   opos:     1027 GB,  run time:  3h 30m 33s,  remaining time:      5h 16m
time since last successful read:          0s

No errors reported.

(In reply to alex.wellmann from comment #117)
> I assume that the end-to-end errors are a result of of issues shown in the 
> logs.

I would agree, see comment #17.
Comment 123 sweeneygj 2016-01-16 17:36:42 UTC
(In reply to Bartłomiej Romański from comment #121)
> Just tried shreding the drive with 'sudo shred -v -n 1 /dev/sde'.
> 
> After ~200 GB shred started reporting errors:
> 
> shred: /dev/sde: error writing at offset 228231399936: Input/output error
> shred: /dev/sde: error writing at offset 228231400448: Input/output error
> shred: /dev/sde: error writing at offset 228231400960: Input/output error
> [..]

Your dmesg in comment #115 has this:

[ 3418.329343] ata2.00: device reported invalid CHS sector 0

Does 'CHS sector 0' occur with the shred too?

Just to confirm, this drive hasn't been written to by anything other than kernel 4.4.0 ?
Comment 124 sweeneygj 2016-01-16 17:39:27 UTC
(In reply to Oliver from comment #118)
> I'm using the 8 TB (AR15) in combination with btrfs on a gentoo 4.3.3 kernel

See comment #106.
Comment 125 Bartłomiej Romański 2016-01-16 17:49:12 UTC
(In reply to sweeneygj from comment #123)

> Does 'CHS sector 0' occur with the shred too?

No. Such error does not occur in dmesg while shredding.

> Just to confirm, this drive hasn't been written to by anything other than
> kernel 4.4.0 ?

No. I've tried using this drive with 4.3 before. Then I've switched to 4.4, repartitioned and tried again. Any chance it was in any way permanently broken by 4.3? If yes, can I "reset" it somehow?
Comment 126 sweeneygj 2016-01-16 18:36:13 UTC
(In reply to sweeneygj from comment #123)
> (In reply to Bartłomiej Romański from comment #121)
> > Just tried shreding the drive with 'sudo shred -v -n 1 /dev/sde'.
> > 
> > After ~200 GB shred started reporting errors:
> > 
> > shred: /dev/sde: error writing at offset 228231399936: Input/output error
> > shred: /dev/sde: error writing at offset 228231400448: Input/output error
> > shred: /dev/sde: error writing at offset 228231400960: Input/output error

512 bytes apart ..... bad disk?

From comment #121:

[66197.105104] blk_update_request: I/O error, dev sde, sector 445156864
                  
445156864 x 512 = 227920314368

'smartctl -A /dev/sdX' and check attributes 5, 197 and 198 equal 0.

I would surface scan the drive.

smartctl -t short /dev/sdX
smartctl -t long /dev/sdX

or use seatools to check.

The original reporter of this bug had a bad drive, comment #2.
Comment 127 Bartłomiej Romański 2016-01-16 19:00:37 UTC
'smartctl -A /dev/sdb' looks fine.

'smartctl -t short /dev/sdb' completed without errors, long test running now (936 minutes to complete :/).

Full output below:


=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   113   096   006    Pre-fail  Always       -       52311640
  3 Spin_Up_Time            0x0003   093   092   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       40
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   060   060   030    Pre-fail  Always       -       1074340
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       26
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       35
183 Runtime_Bad_Block       0x0032   098   098   000    Old_age   Always       -       2
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   099   099   000    Old_age   Always       -       17180131334
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   067   057   045    Old_age   Always       -       33 (Min/Max 23/37)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       36
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       45
194 Temperature_Celsius     0x0022   033   043   000    Old_age   Always       -       33 (0 4 0 0 0)
195 Hardware_ECC_Recovered  0x001a   113   100   000    Old_age   Always       -       52311640
197 Current_Pending_Sector  0x0012   100   075   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   075   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       5 (37 130 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       1868257948
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       106309

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Self-test routine in progress 90%        26         -
# 2  Short offline       Completed without error       00%        26         -
Comment 128 sweeneygj 2016-01-16 19:21:26 UTC
(In reply to Bartłomiej Romański from comment #127)
> 'smartctl -A /dev/sdb' looks fine.
> 
> 'smartctl -t short /dev/sdb' completed without errors, long test running now
> (936 minutes to complete :/).

Not tested but:

smartctl -t  select,445156864-445764455 /dev/sdX

should scan the problem area.
Comment 129 *cJ* 2016-01-16 19:34:13 UTC
If it helps... while a proper solution is found, I have had no problems doing 10 TBs of writes using the blkio controller to limit write speed:
 for x in "8:160" "65:48"; do echo "$x 80000000" | sudo dd of=/sys/fs/cgroup/blkio/blkio.throttle.write_bps_device; done
I went gradually up from 30 MB/s to 80 MB/s and then I got tired because it was good enough for me.

In my experience, the drives are fine, false positive reports of write failures are not recorded by SMART which would be the case if it was a disk write failure.
Comment 130 Bartłomiej Romański 2016-01-17 12:22:01 UTC
both smartctl tests finished without any errors:

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%        40         -
# 2  Short offline       Completed without error       00%        26         -
Comment 131 sweeneygj 2016-01-18 09:58:54 UTC
(In reply to Bartłomiej Romański from comment #130)
> both smartctl tests finished without any errors:

So the drive is OK.

If there is a bug, running shred again should trigger it. 

However, see comment #108 and comment #120. Same kernel and hardware, two different results.

If the bug from comment #115 appears again, you may need to go back to 3.18.X and see if rsync triggers the error.

Finally, I've just shred an old 750GB drive and got I/O errors. No smart errors. so shred errors can be ignored.
Comment 132 Bartłomiej Romański 2016-01-18 19:03:52 UTC
The drive seems to be working fine now. 

I've followed the protip I've got in private:

- turn off the PC
- disconnect power cable from the drive
- wait a few minutes (to be super sure!)
- reconnect the drive and start the system

After this "very hard reboot" the drive is magically working fine. I've mananged to shred the whole 8TB and rsync ~1.5T without any errors.

Thank you for all your help.
Comment 133 Johannes 2016-01-24 11:01:00 UTC
I'm running 4.4 since release and it seems to run without problems now, big thanks especially to Martin.

However, 4.3 introduced a i915 suspend/hibernate bug that has not been fixed in 4.4 (https://bugs.freedesktop.org/show_bug.cgi?id=92414). Is there any way the SMR patch in question can be backported to 4.1 lts? IMHO, a patch like this even qualifies for inclusion in an lts release such as 4.1.
Comment 134 Russell Sim 2016-02-16 02:04:07 UTC
Created attachment 203711 [details]
Log of drives interfering with other disks on same controller

Hey,

That patch works like a charm some of the time.  But if I have disks on the same controller it seems to cause a cascading failure of all disks on that controller.

So the configuration of that 4 channel SATA controller is:
/dev/sdi /dev/sdj are in a btrfs raid1 both seagates 8TB disks
/dev/sdg /dev/sdh are in a md raid1 both western digital 1TB disks

# uname -a
Linux marvin 4.4.0-rc8-amd64 #1 SMP Debian 4.4~rc8-1~exp1 (2016-01-04) x86_64 GNU/Linux

If I copy data to the btrfs at some point in the future everything will fail disabling all devices on the controller.  I have tried to increase the SATA timeouts and have also tried throttling the data using blkio but that just delays the failure.  I'm trying to copy from a raid5 in the same host to the seagates 8TB drives.  If I do it without any throttling it will fail within minutes of the rsync actually starting to copy data.
Comment 135 hatl 2016-02-16 19:37:23 UTC
I can confirm this behavior.

Having another disk on the same controller makes one of the disks unusable.
In my case, I can't even mount the file system.
Disconnecting one disk resolves the issue.