Bug 80031

Summary: libata handles ATA_UNC but does not handle ATA_AMNF. Is this intentional or a bug?
Product: IO/Storage Reporter: Alexey Asemov (alex)
Component: Serial ATAAssignee: Tejun Heo (tj)
Status: RESOLVED CODE_FIX    
Severity: normal CC: alan, alex, andrey_utkin
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: latest Subsystem:
Regression: No Bisected commit-id:
Attachments: Proposed patch to handle AMNF

Description Alexey Asemov 2014-07-11 17:57:53 UTC
Having some too bad drive to read I had multiple strange "device error" retries without any sense code from libata while imaging the drive.

After having spent some time in libata-eh.c and analyzing dmesg output, I found that drive/controller pair returns 0x01 (AMNF = Address Mark Not Found) flag in error byte, but this flag is unhandled as media error in libata-eh.c here:

---
	case ATA_DEV_ATA:
		if (err & ATA_ICRC)
			qc->err_mask |= AC_ERR_ATA_BUS;
		if (err & ATA_UNC)
			qc->err_mask |= AC_ERR_MEDIA;
		if (err & ATA_IDNF)
			qc->err_mask |= AC_ERR_INVALID;
		break;
---

and here

---
	case ATA_DEV_ATA:
		if (err & ATA_ICRC)
			qc->err_mask |= AC_ERR_ATA_BUS;
		if (err & ATA_UNC)
			qc->err_mask |= AC_ERR_MEDIA;
		if (err & ATA_IDNF)
			qc->err_mask |= AC_ERR_INVALID;
		break;
---

so SCSI stack gets some error that it retries on and on. AMNF condition is described in libata-scsi though here:

---
		/*  Bad address mark */
		{0x01, 		MEDIUM_ERROR, 0x13, 0x00}, 	// Address mark not found       Address mark not found for data field
---

Just adding AMNF handling 
-
-if (err & ATA_UNC)
+if (err & (ATA_UNC | ATA_AMNF))
-
(and displaying, but that does not matter) allowed media errors with 'Address Mark Not Found' sense to appear instead of the 'device error', speeding up error handling a lot.

So if not handling AMNF is intentional, it may be a bug that hinders operations with devices that are too bad.
Comment 1 Andrey Utkin 2014-07-11 19:21:57 UTC
AMNF bit is not mentioned at all in latest release of ATA specs, so recent drives are not supposed to indicate it, and developers can be unaware of this bit if they don't check deprecated documentation versions.
BTW I also don't see checks for "ABRT" bit. Cannot say if it is really needed, i'm not a pro on kernal ATA management.

Please go on with submission of a patch.
Comment 2 Alexey Asemov 2014-07-11 19:46:28 UTC
Created attachment 142751 [details]
Proposed patch to handle AMNF

libata-eh.c should handle AMNF error condition (error byte bit 0, usually
code 0x01) in libata-eh.c along with UNC as a media error so SCSI stack
can handle it properly (translation code 0x01 is already present in 
libata-scsi.c) but was never passed down due to lack of handling in EH.
Comment 3 Alexey Asemov 2014-07-11 19:48:14 UTC
Must be tested though to ensure it does not break something, if AMNF is currently undocumented. I don't think someone will break the compatibility though using AMNF because many software and even BIOS'es will break then.
Comment 4 Andrey Utkin 2014-07-11 19:52:00 UTC
Ok, have you posted it?
To:
Tejun Heo <tj@kernel.org> (supporter:SERIAL ATA (SATA)...)
linux-ide@vger.kernel.org (open list:SERIAL ATA (SATA)...)
linux-kernel@vger.kernel.org (open list)
Comment 5 Alexey Asemov 2014-07-11 19:56:08 UTC
Do I need to mail it now to the addresses above?
Comment 6 Andrey Utkin 2014-07-11 19:57:13 UTC
Yes, preferrably in the way the command "git send-email" does.
Comment 7 Alexey Asemov 2014-07-12 10:58:00 UTC
Mailed it.