Bug 16181

Summary:	libata EH going crazy on hotplug PW protected HDD
Product:	IO/Storage	Reporter:	Robert de Rooy (robert.de.rooy)
Component:	Serial ATA	Assignee:	Jeff Garzik (jgarzik)
Status:	RESOLVED CODE_FIX
Severity:	normal	CC:	alan, kay, mlord, tj
Priority:	P1
Hardware:	All
OS:	Linux
Kernel Version:	2.6.33.5-112.fc13.i686	Subsystem:
Regression:	No	Bisected commit-id:
Attachments:	syslog read_part_sector-sane-retries.patch syslog after patch Example session showing security lock/unlock read_part_sector-sane-retries.patch gzipped syslog kernel config file syslog without hal and udev running syslog without hal and udev running

Description Robert de Rooy 2010-06-11 12:09:46 UTC

When I hotplug a password protected HDD into my ThinkPad ultrabay, the libata EH gets stuck in a loop until I remove the HDD again.

Initially I see these messages in syslog (ICH4 ata_piix);

ACPI: \_SB_.PCI0.IDE0.SCND.MSTR - docking
ata2: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0xe frozen
ata2: ACPI event
ata2: soft resetting link
ata2.00: ATA-6: HTS726060M9AT00, MH4OA6DA, max UDMA/100
ata2.00: 117210240 sectors, multi 0: LBA 
ata2.00: limited to UDMA/33 due to 40-wire cable
ata2.00: configured for UDMA/33
ata2: EH complete
scsi 1:0:0:0: Direct-Access     ATA      HTS726060M9AT00  MH4O PQ: 0 ANSI: 5
sd 1:0:0:0: Attached scsi generic sg1 type 0
sd 1:0:0:0: [sdb] 117210240 512-byte logical blocks: (60.0 GB/55.8 GiB)
sd 1:0:0:0: [sdb] Write Protect is off
sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sdb:

After which there is a continuous stream of error such as these...

ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata2.00: BMDMA stat 0x5
ata2.00: failed command: READ DMA
ata2.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 0 dma 4096 in
        res 51/04:08:00:00:00/00:00:00:00:00/e0 Emask 0x1 (device error)
ata2.00: status: { DRDY ERR }
ata2.00: error: { ABRT }
ata2.00: configured for UDMA/33
ata2: EH complete

It seems the EH does not recognize the fact that the HDD is locked and just keeps trying. Instead it should probably recognize it is locked and cause some sort of event for userland to pickup to unlock the HDD.

Comment 1 Robert de Rooy 2010-06-11 12:15:38 UTC

Here is what I see on a T60 with ICH7 (Ultrabay is still PATA)

ACPI: \_SB_.PCI0.IDE0.PRIM.MSTR - docking
Jun 11 14:12:23 t60 kernel: ata5: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0xe frozen
Jun 11 14:12:23 t60 kernel: ata5: ACPI event
Jun 11 14:12:24 t60 kernel: ata5: soft resetting link
Jun 11 14:12:24 t60 kernel: ata5.00: ATA-6: HTS726060M9AT00, MH4OA6DA, max UDMA/100
Jun 11 14:12:24 t60 kernel: ata5.00: 117210240 sectors, multi 0: LBA 
Jun 11 14:12:24 t60 kernel: ata5.00: limited to UDMA/33 due to 40-wire cable
Jun 11 14:12:24 t60 kernel: ata5.00: configured for UDMA/33
Jun 11 14:12:24 t60 kernel: ata5: EH complete
Jun 11 14:12:24 t60 kernel: scsi 4:0:0:0: Direct-Access     ATA      HTS726060M9AT00  MH4O PQ: 0 ANSI: 5
Jun 11 14:12:24 t60 kernel: sd 4:0:0:0: Attached scsi generic sg1 type 0
Jun 11 14:12:24 t60 kernel: sd 4:0:0:0: [sdb] 117210240 512-byte logical blocks: (60.0 GB/55.8 GiB)
Jun 11 14:12:24 t60 kernel: sd 4:0:0:0: [sdb] Write Protect is off
Jun 11 14:12:24 t60 kernel: sd 4:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Jun 11 14:12:24 t60 kernel: sdb:

And the same continuous stream of errors

Jun 11 14:12:24 t60 kernel: ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Jun 11 14:12:24 t60 kernel: ata5.00: BMDMA stat 0x25
Jun 11 14:12:24 t60 kernel: ata5.00: failed command: READ DMA
Jun 11 14:12:24 t60 kernel: ata5.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 0 dma 4096 in
Jun 11 14:12:24 t60 kernel:         res 51/04:08:00:00:00/00:00:00:00:00/e0 Emask 0x1 (device error)
Jun 11 14:12:24 t60 kernel: ata5.00: status: { DRDY ERR }
Jun 11 14:12:24 t60 kernel: ata5.00: error: { ABRT }
Jun 11 14:12:24 t60 kernel: ata5.00: configured for UDMA/33
Jun 11 14:12:24 t60 kernel: ata5: EH complete

Comment 2 Tejun Heo 2010-06-12 17:53:12 UTC

Can you please attach the whole output of dmesg instead of snippets?  How continuous is continuous?  libata itself doesn't really care whether the drive is locked or not.  If it's locked, IOs will fail and whoever is trying to access the drive should give up after a few tries.  It looks like the failing command is read from the partition code which should give up after several retries, doesn't it?

Thanks.

Comment 3 Robert de Rooy 2010-06-13 12:13:10 UTC

Created attachment 26752 [details]
syslog

I mean it keeps trying it indefinably until you remove the disk, which I did in
the attached log file after a bit.

Comment 4 Tejun Heo 2010-06-16 09:24:20 UTC

Hmmm... yeah, that is annoyingly persistent.  It's just going through all the different partition types and libata can't tell whether the command failure is persistent or transient so it can't preemptively fail commands on issue.  Probably the most expensive part of all that is printing out all those information to console.  If you make the kernel to avoid logging to console (e.g. klogconsole -l1), it doesn't take too long to finish, right?

Comment 5 Robert de Rooy 2010-06-16 10:02:35 UTC

klogconsole does not exist on Fedora 13.

But your right, after a bit it does stop. It just took a few tries...

# dmesg |grep EH|wc -l
280

Is there no way for libata to detect the HDD is locked before it starts scanning partition types?

Comment 6 Tejun Heo 2010-06-16 10:11:22 UTC

The thing is that ATA error reporting is not very accurate.  libata can try to interpret the identify data but again the less critical part of it doesn't tend to be too reliable, so the general policy should be try-if-not-completely-sure.  We can suppress or use lower logging level for failures during partition probe but I'm not quite sure whether that would be a good idea either.  How long does it take to actually give up?

Thanks.

Comment 7 Robert de Rooy 2010-06-16 10:37:57 UTC

On the old ThinkPad T41 (ICH4) it took 20 seconds. On the newer T60 (ICH7 PATA) it took 36 seconds instead.

Comment 8 Robert de Rooy 2010-06-16 10:45:05 UTC

The poor performance on the T60 is probably due to the integrated graphics. When I measured it again without following the log, it also dropped to 20 seconds.

Comment 9 Tejun Heo 2010-06-16 11:47:37 UTC

Hmmm.... I'm not sure how to work around this one.  We definitely want to be retrying there but we are trying too hard.  Those reads are coming from partition code and as there's no shared state between different partition probe code different partition types end up issuing reads to the same sectors regardless they failed previously or not.  I think it would be best if partition probe code tries only different sectors.  I'll see if that can be implemented without disturbing the code too much but I have to say it's a pretty niche problem.

Thanks.

Comment 10 Robert de Rooy 2010-06-16 12:20:59 UTC

Well, the next issue would be, how to actually unlock the HDD from Linux, as hdparm just fails on both ThinkPads with an IO error for any of the security commands. Regardless if the HDD is actually locked or not.

# hdparm --user-master u --security-unlock test /dev/sdb
security_password="test"

/dev/sdb:
 Issuing SECURITY_UNLOCK command, password="test", user=user
SECURITY_UNLOCK: Input/output error

Comment 11 Tejun Heo 2010-06-16 12:24:51 UTC

Well, that's something for Mark. cc'd.

Comment 12 Mark Lord 2010-06-16 23:19:09 UTC

The BIOS has probably done a "SECURITY FREEZE" command.
To undo that, you'll have to power-cycle the driver (just pull it out, and plug it back in again quickly).

Then the other commands might work.

Try 'hdparm -I /dev/sdb', and look for 'frozen' or 'not frozen' near the bottom of the output.

Comment 13 Robert de Rooy 2010-06-17 07:23:16 UTC

While that is the case for my boot drive (sda), it is not the case for the just hotplugged drive sdb.

$ sudo hdparm -I /dev/sdb

/dev/sdb:

ATA device, with non-removable media
	Model Number:       HTS726060M9AT00                         
	Serial Number:      MRH453M4H11ARB
	Firmware Revision:  MH4OA6DA
Standards:
	Used: ATA/ATAPI-6 T13 1410D revision 3a 
	Supported: 6 5 4 
Configuration:
	Logical		max	current
	cylinders	16383	16383
	heads		16	16
	sectors/track	63	63
	--
	CHS current addressable sectors:   16514064
	LBA    user addressable sectors:  117210240
	Logical/Physical Sector size:           512 bytes
	device size with M = 1024*1024:       57231 MBytes
	device size with M = 1000*1000:       60011 MBytes (60 GB)
	cache/buffer size  = 7877 KBytes (type=DualPortCache)
Capabilities:
	LBA, IORDY(can be disabled)
	Standby timer values: spec'd by Vendor, no device specific minimum
	R/W multiple sector transfer: Max = 16	Current = 0
	Advanced power management level: 128
	Recommended acoustic management value: 128, current value: 254
	DMA: mdma0 mdma1 mdma2 udma0 udma1 *udma2 udma3 udma4 udma5 
	     Cycle time: min=120ns recommended=120ns
	PIO: pio0 pio1 pio2 pio3 pio4 
	     Cycle time: no flow control=240ns  IORDY flow control=120ns
Commands/features:
	Enabled	Supported:
	   *	SMART feature set
	   *	Security Mode feature set
	   *	Power Management feature set
	   *	Write cache
	   *	Look-ahead
	   *	Host Protected Area feature set
	   *	WRITE_BUFFER command
	   *	READ_BUFFER command
	   *	NOP cmd
	   *	Advanced Power Management feature set
	    	Power-Up In Standby feature set
	   *	SET_FEATURES required to spinup after power up
	    	Address Offset Reserved Area Boot
	    	SET_MAX security extension
	    	Automatic Acoustic Management feature set
	   *	Device Configuration Overlay feature set
	   *	Mandatory FLUSH_CACHE
	   *	SMART error logging
	   *	SMART self-test
Security: 
	Master password revision code = 65534
		supported
		enabled
		locked
	not	frozen
	not	expired: security count
	not	supported: enhanced erase
	Security level maximum
	38min for SECURITY ERASE UNIT. 
HW reset results:
	CBLID- above Vih
	Device num = 0 determined by the jumper
Checksum: correct

Comment 14 Mark Lord 2010-06-17 12:19:07 UTC

With regards to the original discussion about partition probing:  the IDENTIFY data (shown above) clearly shows the drive as "locked".  So that could be used to prevent the pointless thrashing discussed earlier.

As for unlocking the drive, does this work?

   hdparm --user-master m --security-unlock test /dev/sdb

If not, then what makes you think "test" is the correct password ?

Remember that security commands are designed to be oblique.  They fail without indications as to why they failed, so as not to give clues to attackers.  Thus, attempting to "unlock" a non-locked drive will fail.

-ml

Comment 15 Robert de Rooy 2010-06-17 13:58:23 UTC

The HDD is set in the BIOS with that password, as a "User" password (not the "Master+User" password). And if I boot with the drive present, the BIOS will prompt me for it and unlock it.

Also, *any* hdparm --security-* commands fail with the same IO error regardless if the drive is locked or not. And in none of the cases is hdparm reporting it as being frozen.

[root@t41 robert]# hdparm --user-master m --security-unlock test /dev/sdb
security_password="test"

/dev/sdb:
 Issuing SECURITY_UNLOCK command, password="test", user=master
SECURITY_UNLOCK: Input/output error
[root@t41 robert]# hdparm --user-master u --security-unlock test /dev/sdb
security_password="test"

/dev/sdb:
 Issuing SECURITY_UNLOCK command, password="test", user=user
SECURITY_UNLOCK: Input/output error
[root@t41 robert]# hdparm -I /dev/sdb|grep frozen
	not	frozen

Comment 16 Tejun Heo 2010-06-17 15:52:47 UTC

(In reply to comment #14)
> With regards to the original discussion about partition probing:  the
> IDENTIFY
> data (shown above) clearly shows the drive as "locked".  So that could be
> used
> to prevent the pointless thrashing discussed earlier.

I don't really wanna do that.  There could (or rather would) be devices which have garbages there or fail to update the values after unlocking or whatever.  As a general rule, I don't think it would be wise to preemptively fail commands without actually trying regardless of what the IDENTIFY data says.

Thanks.

Comment 17 Mark Lord 2010-06-17 17:20:22 UTC

There's zero guarantee that the password you type into a BIOS is the exact thing the BIOS passes to the drive.  It could be hashing it for all we know.

Show us an example of a security command working or not-working on an unlocked drive.

And use --verbose with hdparm in both cases, so we can see exactly what the drive is doing.

Comment 18 Robert de Rooy 2010-06-18 07:52:56 UTC

Your right that the BIOS sets any drives that are present during POST to 'frozen', regardless if they have a PW set or not.

I removed the password in the BIOS, and hotplugged the drive again after Linux was booted.

# hdparm -I /dev/sdb

/dev/sdb:

ATA device, with non-removable media
	Model Number:       HTS726060M9AT00                         
	Serial Number:      MRH453M4H11ARB
	Firmware Revision:  MH4OA6DA
Standards:
	Used: ATA/ATAPI-6 T13 1410D revision 3a 
	Supported: 6 5 4 
Configuration:
	Logical		max	current
	cylinders	16383	16383
	heads		16	16
	sectors/track	63	63
	--
	CHS current addressable sectors:   16514064
	LBA    user addressable sectors:  117210240
	Logical/Physical Sector size:           512 bytes
	device size with M = 1024*1024:       57231 MBytes
	device size with M = 1000*1000:       60011 MBytes (60 GB)
	cache/buffer size  = 7877 KBytes (type=DualPortCache)
Capabilities:
	LBA, IORDY(can be disabled)
	Standby timer values: spec'd by Vendor, no device specific minimum
	R/W multiple sector transfer: Max = 16	Current = 0
	Advanced power management level: 128
	Recommended acoustic management value: 128, current value: 254
	DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5 
	     Cycle time: min=120ns recommended=120ns
	PIO: pio0 pio1 pio2 pio3 pio4 
	     Cycle time: no flow control=240ns  IORDY flow control=120ns
Commands/features:
	Enabled	Supported:
	   *	SMART feature set
	    	Security Mode feature set
	   *	Power Management feature set
	   *	Write cache
	   *	Look-ahead
	   *	Host Protected Area feature set
	   *	WRITE_BUFFER command
	   *	READ_BUFFER command
	   *	NOP cmd
	   *	Advanced Power Management feature set
	    	Power-Up In Standby feature set
	   *	SET_FEATURES required to spinup after power up
	    	Address Offset Reserved Area Boot
	    	SET_MAX security extension
	    	Automatic Acoustic Management feature set
	   *	Device Configuration Overlay feature set
	   *	Mandatory FLUSH_CACHE
	   *	SMART error logging
	   *	SMART self-test
Security: 
	Master password revision code = 65534
		supported
	not	enabled
	not	locked
	not	frozen
	not	expired: security count
	not	supported: enhanced erase
	38min for SECURITY ERASE UNIT. 
HW reset results:
	CBLID- above Vih
	Device num = 0 determined by the jumper
Checksum: correct

After this it seems like the hdparm --security-* commands succeed, although setting the master pw does not actually seem to have any result (command succeeds, but IDENTIFY data still reports it as not enabled), the user pw does.

# hdparm --verbose --user-master u --security-set-pass test /dev/sdb
security_password="test"

/dev/sdb:
 Issuing SECURITY_SET_PASS command, password="test", user=user, mode=high
oflags.lob_all=0x82, flags={ feat command }
oflags.hob_all=0x82, flags={ feat }
using LBA48 taskfile
outgoing cdb:  85 0b 26 00 00 00 00 00 00 00 00 00 00 40 f1 00
data:  00 00 74 65 73 74 00 00 00 00 00 00 00 00 00 00
SG_IO: ATA_16 status=0x2, host_status=0x0, driver_status=0x8
SG_IO: sb[]:  72 00 00 00 00 00 00 0e 09 0c 01 00 00 00 00 00 00 00 00 00 e0 50 00 00 00 00 00 00 00 00 00 00
SG_IO: desc[]:  09 0c 01 00 00 00 00 00 00 00 00 00
      ATA_16 stat=50 err=00 nsect=00 lbal=00 lbam=00 lbah=00 dev=e0

# hdparm -I /dev/sdb|grep enabled
		enabled

After this I removed, and plugged the drive again, and indeed it is locked, but the unlock is failing

# hdparm --verbose --user-master u --security-unlock test /dev/sdb
security_password="test"

/dev/sdb:
 Issuing SECURITY_UNLOCK command, password="test", user=user
oflags.lob_all=0x82, flags={ feat command }
oflags.hob_all=0x82, flags={ feat }
using LBA48 taskfile
outgoing cdb:  85 0b 26 00 00 00 00 00 00 00 00 00 00 40 f2 00
data:  00 00 74 65 73 74 00 00 00 00 00 00 00 00 00 00
SG_IO: ATA_16 status=0x2, host_status=0x0, driver_status=0x8
SG_IO: sb[]:  72 0b 00 00 00 00 00 0e 09 0c 01 04 00 00 00 00 00 00 00 00 e0 51 00 00 00 00 00 00 00 00 00 00
SG_IO: desc[]:  09 0c 01 04 00 00 00 00 00 00 00 00
      ATA_16 stat=51 err=04 nsect=00 lbal=00 lbam=00 lbah=00 dev=e0
I/O error, ata_op=0xf2 ata_status=0x51 ata_error=0x04
SECURITY_UNLOCK: Input/output error

# hdparm --verbose --user-master u --security-disable test /dev/sdb
security_password="test"

/dev/sdb:
 Issuing SECURITY_DISABLE command, password="test", user=user
oflags.lob_all=0x82, flags={ feat command }
oflags.hob_all=0x82, flags={ feat }
using LBA48 taskfile
outgoing cdb:  85 0b 26 00 00 00 00 00 00 00 00 00 00 40 f2 00
data:  00 00 74 65 73 74 00 00 00 00 00 00 00 00 00 00
SG_IO: ATA_16 status=0x2, host_status=0x0, driver_status=0x8
SG_IO: sb[]:  72 0b 00 00 00 00 00 0e 09 0c 01 04 00 00 00 00 00 00 00 00 e0 51 00 00 00 00 00 00 00 00 00 00
SG_IO: desc[]:  09 0c 01 04 00 00 00 00 00 00 00 00
      ATA_16 stat=51 err=04 nsect=00 lbal=00 lbam=00 lbah=00 dev=e0
I/O error, ata_op=0xf2 ata_status=0x51 ata_error=0x04
SECURITY_UNLOCK: Input/output error

# hdparm --verbose --user-master u --security-erase test /dev/sdb
security_password="test"

/dev/sdb:
 Issuing SECURITY_ERASE command, password="test", user=user
outgoing cdb:  85 06 20 00 00 00 00 00 00 00 00 00 00 40 f3 00
SG_IO: ATA_16 status=0x2, host_status=0x0, driver_status=0x8
SG_IO: sb[]:  72 00 00 00 00 00 00 0e 09 0c 00 00 00 00 00 00 00 00 00 00 e0 50 00 00 00 00 00 00 00 00 00 00
SG_IO: desc[]:  09 0c 00 00 00 00 00 00 00 00 00 00
      ATA_16 stat=50 err=00 nsect=00 lbal=00 lbam=00 lbah=00 dev=e0
oflags.lob_all=0x82, flags={ feat command }
oflags.hob_all=0x82, flags={ feat }
using LBA48 taskfile
outgoing cdb:  85 0b 26 00 00 00 00 00 00 00 00 00 00 40 f4 00
data:  00 00 74 65 73 74 00 00 00 00 00 00 00 00 00 00
SG_IO: ATA_16 status=0x2, host_status=0x0, driver_status=0x8
SG_IO: sb[]:  72 0b 00 00 00 00 00 0e 09 0c 01 04 00 00 00 00 00 00 00 00 e0 51 00 00 00 00 00 00 00 00 00 00
SG_IO: desc[]:  09 0c 01 04 00 00 00 00 00 00 00 00
      ATA_16 stat=51 err=04 nsect=00 lbal=00 lbam=00 lbah=00 dev=e0
I/O error, ata_op=0xf4 ata_status=0x51 ata_error=0x04
SECURITY_ERASE: Input/output error

And the BIOS is not recognizing the password either, so it seems your right on the hashing part, and I have transformed the drive into a doorstop.

Comment 19 Tejun Heo 2010-06-18 13:35:18 UTC

Created attachment 26843 [details]
read_part_sector-sane-retries.patch

Can you please try this patch and see whether it helps?

Thanks.

Comment 20 Mark Lord 2010-06-18 19:20:32 UTC

Try unlocking it with the master password that you set (even though you said it didn't appear to lock after setting the master password).

Comment 21 Robert de Rooy 2010-06-18 21:43:50 UTC

Created attachment 26851 [details]
syslog after patch

I applied the patch on top of 2.6.35-rc3, but it did not seem to reduce the messages, in fact it may have done the opposite

$ cat messages.txt |grep EH|wc -l
1740

I even had to compress the log file to get it under the bugzilla limit of 1MB.

Comment 22 Robert de Rooy 2010-06-18 21:49:24 UTC

(In reply to comment #20)
> Try unlocking it with the master password that you set (even though you said
> it
> didn't appear to lock after setting the master password).

After hotplugging the drive with the 2.6.35-rc3 kernel

# hdparm -I /dev/sdb|grep -A9 Security:
Security: 
	Master password revision code = 65297
		supported
		enabled
		locked
	not	frozen
	not	expired: security count
	not	supported: enhanced erase
	Security level high
	38min for SECURITY ERASE UNIT.

# hdparm --verbose --user-master m --security-unlock test /dev/sdb
security_password="test"

/dev/sdb:
 Issuing SECURITY_UNLOCK command, password="test", user=master
oflags.lob_all=0x82, flags={ feat command }
oflags.hob_all=0x82, flags={ feat }
using LBA48 taskfile
outgoing cdb:  85 0b 26 00 00 00 00 00 00 00 00 00 00 40 f2 00
data:  01 00 74 65 73 74 00 00 00 00 00 00 00 00 00 00
SG_IO: ATA_16 status=0x2, host_status=0x0, driver_status=0x8
SG_IO: sb[]:  72 0b 00 00 00 00 00 0e 09 0c 01 04 00 00 00 00 00 00 00 00 e0 51 00 00 00 00 00 00 00 00 00 00
SG_IO: desc[]:  09 0c 01 04 00 00 00 00 00 00 00 00
      ATA_16 stat=51 err=04 nsect=00 lbal=00 lbam=00 lbah=00 dev=e0
I/O error, ata_op=0xf2 ata_status=0x51 ata_error=0x04
SECURITY_UNLOCK: Input/output error

# hdparm --verbose --user-master m --security-disable test /dev/sdb
security_password="test"

/dev/sdb:
 Issuing SECURITY_DISABLE command, password="test", user=master
oflags.lob_all=0x82, flags={ feat command }
oflags.hob_all=0x82, flags={ feat }
using LBA48 taskfile
outgoing cdb:  85 0b 26 00 00 00 00 00 00 00 00 00 00 40 f2 00
data:  01 00 74 65 73 74 00 00 00 00 00 00 00 00 00 00
SG_IO: ATA_16 status=0x2, host_status=0x0, driver_status=0x8
SG_IO: sb[]:  72 0b 00 00 00 00 00 0e 09 0c 01 04 00 00 00 00 00 00 00 00 e0 51 00 00 00 00 00 00 00 00 00 00
SG_IO: desc[]:  09 0c 01 04 00 00 00 00 00 00 00 00
      ATA_16 stat=51 err=04 nsect=00 lbal=00 lbam=00 lbah=00 dev=e0
I/O error, ata_op=0xf2 ata_status=0x51 ata_error=0x04
SECURITY_UNLOCK: Input/output error

Comment 23 Mark Lord 2010-06-18 22:18:34 UTC

The data going out all looks correct there.

Here (attached in a sec) is a log of a session I just did here, locking and unlocking a drive on 2.6.32.5, using my latest (pre-release) of hdparm (no security changes).

Comment 24 Mark Lord 2010-06-18 22:19:27 UTC

Created attachment 26854 [details]
Example session showing security lock/unlock

Comment 25 Tejun Heo 2010-06-19 08:17:03 UTC

Created attachment 26859 [details]
read_part_sector-sane-retries.patch

Can you please try this one then?  Maybe the IOs are not coming from partition check code at all?  Can you please attach .config?

Comment 26 Robert de Rooy 2010-06-19 21:43:48 UTC

Created attachment 26864 [details]
gzipped syslog

Tejun,

Here is the syslog after the second patch, the log is still getting flooded.

Comment 27 Robert de Rooy 2010-06-19 21:47:40 UTC

Created attachment 26865 [details]
kernel config file

Comment 28 Robert de Rooy 2010-06-19 21:57:59 UTC

(In reply to comment #23)
> The data going out all looks correct there.
> 
> Here (attached in a sec) is a log of a session I just did here, locking and
> unlocking a drive on 2.6.32.5, using my latest (pre-release) of hdparm (no
> security changes).

Strange, as you can see from my logs, I was able to lock the drive, but any unlock commands just fail with IO errors. I tried 2 different ThinkPads (T41 and T60), and it made no difference. I did use the same HDD in both cases, but am rather hesitant to try another one. Especially since the BIOS based lock/unlock had always worked fine.

Any chance there may be a difference due to the fact that I used a PATA drive and you used a SATA drive?

I guess the hdparm version shipped with F13 is new enough
hdparm-9.27-1.fc13.x86_64

Comment 29 Tejun Heo 2010-06-21 12:05:01 UTC

So, most of those reads aren't from kernel at all.  Can you please test w/ udev and hal stopped?

Comment 30 Robert de Rooy 2010-06-21 14:36:12 UTC

Created attachment 26880 [details]
syslog without hal and udev running

stopping hal does not make any difference, but stopping udev drops the number of EH messages to 55.

Comment 31 Tejun Heo 2010-06-21 14:37:29 UTC

Can you please try with the patch applied and udev stopped?

Comment 32 Tejun Heo 2010-06-21 14:42:25 UTC

And I suppose it's about time to summon Kay (hello!).

Kay, this is a rather rare corner case but sometimes disk device exists but fails all IOs.  It looks like udev is a bit too persistent in retrying and keeps issuing IOs to the same sectors.  Do you think this is something which can be improved from udev?

Thanks.

Comment 33 Kay Sievers 2010-06-21 15:09:41 UTC

On Mon, Jun 21, 2010 at 16:42,  <bugzilla-daemon@bugzilla.kernel.org> wrote:
> Kay, this is a rather rare corner case but sometimes disk device exists but
> fails all IOs.  It looks like udev is a bit too persistent in retrying and
> keeps issuing IOs to the same sectors.  Do you think this is something which
> can be improved from udev?

Hmm, not sure if udev can ever decide that a device is not readable
and switch some state here.

Maybe the first thing would be to add the needed stuff to ata_id:
  http://git.kernel.org/?p=linux/hotplug/udev.git;a=blob;f=extras/ata_id/ata_id.c;hb=HEAD
to identify locked/protected disks?

And we can possibly avoid some further probing of the disk based on
that information? At least the blkid call (which will do quite a few
small reads) we could exclude, but there might be several things
plugging into udev events which are not really under our control.

Comment 34 Tejun Heo 2010-06-21 15:20:50 UTC

Hmmm... maybe, I'm not sure.  In general, for ATA, I think things should be trial based as long as possible.  There are a lot of devices with extremely poorly done firmwares, especially in SSD and CF devices, and we run the risk of breaking working devices if we deny access without actually trying based on the IDENTIFY data.  I wrote this earlier but I have this this nagging feeling that there gotta be devices which would be okay if security lock isn't used at all but fail to update locked status if it's used and unlocked.

Maybe ata_id can check IDENTIFY data and then issue a read to check whether it's actually locked.  That should be mostly safe, I think.

One way or the other, I think it's generally low priority.  It's far more important to keep working things working and ATA security lock feature isn't a very often used feature anyway (its usefulness as a security measure is dubious to begin with).

Thanks.

Comment 35 Robert de Rooy 2010-06-22 12:32:47 UTC

Created attachment 26901 [details]
syslog without hal and udev running

with the kernel patch applied, without udev & hal running. The EH is 'only' being called 13 times now.