Having some too bad drive to read I had multiple strange "device error" retries without any sense code from libata while imaging the drive. After having spent some time in libata-eh.c and analyzing dmesg output, I found that drive/controller pair returns 0x01 (AMNF = Address Mark Not Found) flag in error byte, but this flag is unhandled as media error in libata-eh.c here: --- case ATA_DEV_ATA: if (err & ATA_ICRC) qc->err_mask |= AC_ERR_ATA_BUS; if (err & ATA_UNC) qc->err_mask |= AC_ERR_MEDIA; if (err & ATA_IDNF) qc->err_mask |= AC_ERR_INVALID; break; --- and here --- case ATA_DEV_ATA: if (err & ATA_ICRC) qc->err_mask |= AC_ERR_ATA_BUS; if (err & ATA_UNC) qc->err_mask |= AC_ERR_MEDIA; if (err & ATA_IDNF) qc->err_mask |= AC_ERR_INVALID; break; --- so SCSI stack gets some error that it retries on and on. AMNF condition is described in libata-scsi though here: --- /* Bad address mark */ {0x01, MEDIUM_ERROR, 0x13, 0x00}, // Address mark not found Address mark not found for data field --- Just adding AMNF handling - -if (err & ATA_UNC) +if (err & (ATA_UNC | ATA_AMNF)) - (and displaying, but that does not matter) allowed media errors with 'Address Mark Not Found' sense to appear instead of the 'device error', speeding up error handling a lot. So if not handling AMNF is intentional, it may be a bug that hinders operations with devices that are too bad.
AMNF bit is not mentioned at all in latest release of ATA specs, so recent drives are not supposed to indicate it, and developers can be unaware of this bit if they don't check deprecated documentation versions. BTW I also don't see checks for "ABRT" bit. Cannot say if it is really needed, i'm not a pro on kernal ATA management. Please go on with submission of a patch.
Created attachment 142751 [details] Proposed patch to handle AMNF libata-eh.c should handle AMNF error condition (error byte bit 0, usually code 0x01) in libata-eh.c along with UNC as a media error so SCSI stack can handle it properly (translation code 0x01 is already present in libata-scsi.c) but was never passed down due to lack of handling in EH.
Must be tested though to ensure it does not break something, if AMNF is currently undocumented. I don't think someone will break the compatibility though using AMNF because many software and even BIOS'es will break then.
Ok, have you posted it? To: Tejun Heo <tj@kernel.org> (supporter:SERIAL ATA (SATA)...) linux-ide@vger.kernel.org (open list:SERIAL ATA (SATA)...) linux-kernel@vger.kernel.org (open list)
Do I need to mail it now to the addresses above?
Yes, preferrably in the way the command "git send-email" does.
Mailed it.