Bug 215704 - Trouble locating documentation related to disk read timeout /sys/block/*/device/timeout OR /sys/devices/**/timeout
Summary: Trouble locating documentation related to disk read timeout /sys/block/*/devi...
Status: NEW
Alias: None
Product: Documentation
Classification: Unclassified
Component: man-pages (show other bugs)
Hardware: All Linux
: P1 enhancement
Assignee: documentation_man-pages@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-03-20 01:57 UTC by Michael Evans
Modified: 2022-03-20 18:49 UTC (History)
0 users

See Also:
Kernel Version:
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Michael Evans 2022-03-20 01:57:22 UTC
I've been unable to locate Documentation that matches some locations within sysfs.

/sys/block/*/device/timeout
/sys/devices/**/timeout

They all appear to cat a value of 30.  However an initial experiment with this value appears to diverge from assumptions I had made.

My context is related to TLER
smartctl -l scterc,${TIMEOUT_SEC}0,${TIMEOUT_SEC}0 /dev/sdX

I have a software RAID (ZFS in this case) which isn't interacting the way it expects with underlying hardware.  I was hoping to find a manual that might explain how the block device tunable fields such as 'timeout' are intended to be utilized and tuned.

I didn't see anything obvious in search of text and 'reStructuredText' files with the obvious keyword:

https://github.com/torvalds/linux/search?l=Text&p=6&q=timeout
https://github.com/torvalds/linux/search?l=reStructuredText&p=16&q=timeout
Comment 1 Michael Evans 2022-03-20 18:49:44 UTC
I should add some context.  I want the kernel ata / sd layers to handle unresponsive devices so that, ideally, some kind of 'this path is slow, but you can keep waiting' message is given to upper layers.  New commands should be soft-failed with a busy state or something similar that conveys the status of 'stalled' without 'error' (so far).  I would also hope that any such stall is handled as a barrier for the device, and any other outstanding requests retried unless they too are returned with errors.

Somehow events, such as the dmesg entry that follows, correlate to enough errors to 'fault' the device and knock it out of the pool (during a repair scrub).

Thus I was looking for a Documentation file that covered the timeout configuration file and gave guidance on if or how it should be tuned in relation to other aspects of the disks.

The disk with these responses is a Seagate Exos X16 (ST16000NM001G-2KK103) Firmware SN03 believed to be ATA ACS-4, 4k sector, CMR.  No errors (no pending / remapped sectors, no logged sectors failed).

[ 1362.163151] ata3.00: exception Emask 0x10 SAct 0x60000000 SErr 0x280100 action 0x6 frozen
[ 1362.163184] ata3.00: irq_stat 0x08000000, interface fatal error
[ 1362.163200] ata3: SError: { UnrecovData 10B8B BadCRC }
[ 1362.163216] ata3.00: failed command: READ FPDMA QUEUED
[ 1362.163230] ata3.00: cmd 60/c0:e8:28:48:d2/03:00:d9:03:00/40 tag 29 ncq dma 491520 in
                        res 40/00:f0:e8:4b:d2/00:00:d9:03:00/40 Emask 0x10 (ATA bus error)
[ 1362.163272] ata3.00: status: { DRDY }
[ 1362.163283] ata3.00: failed command: READ FPDMA QUEUED
[ 1362.163297] ata3.00: cmd 60/40:f0:e8:4b:d2/00:00:d9:03:00/40 tag 30 ncq dma 32768 in
                        res 40/00:f0:e8:4b:d2/00:00:d9:03:00/40 Emask 0x10 (ATA bus error)
[ 1362.163338] ata3.00: status: { DRDY }
[ 1362.163350] ata3: hard resetting link
[ 1362.476057] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 1362.506459] ata3.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
[ 1362.506465] ata3.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
[ 1362.506467] ata3.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
[ 1362.564800] ata3.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
[ 1362.564815] ata3.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
[ 1362.564817] ata3.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
[ 1362.603044] ata3.00: configured for UDMA/133
[ 1362.603061] sd 2:0:0:0: [sdc] tag#29 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
[ 1362.603065] sd 2:0:0:0: [sdc] tag#29 Sense Key : Illegal Request [current] 
[ 1362.603067] sd 2:0:0:0: [sdc] tag#29 Add. Sense: Unaligned write command
[ 1362.603070] sd 2:0:0:0: [sdc] tag#29 CDB: Read(16) 88 00 00 00 00 03 d9 d2 48 28 00 00 03 c0 00 00
[ 1362.603071] I/O error, dev sdc, sector 16539338792 op 0x0:(READ) flags 0x700 phys_seg 15 prio class 0
[ 1362.603129] zio pool=REDACTED vdev=/dev/disk/by-partlabel/REDACTED error=5 type=1 offset=... size=491520 flags=40080cb0
[ 1362.603239] sd 2:0:0:0: [sdc] tag#30 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
[ 1362.603276] sd 2:0:0:0: [sdc] tag#30 Sense Key : Illegal Request [current] 
[ 1362.603332] sd 2:0:0:0: [sdc] tag#30 Add. Sense: Unaligned write command
[ 1362.603337] sd 2:0:0:0: [sdc] tag#30 CDB: Read(16) 88 00 00 00 00 03 d9 d2 4b e8 00 00 00 40 00 00
[ 1362.603389] I/O error, dev sdc, sector 16539339752 op 0x0:(READ) flags 0x700 phys_seg 1 prio class 0
[ 1362.603738] zio pool=REDACTED vdev=/dev/disk/by-partlabel/REDACTED error=5 type=1 offset=... size=32768 flags=1808b0
[ 1362.604011] ata3: EH complete

FAULTED     17     0     0  too many errors  (repairing)

Note You need to log in before you can comment on or make changes to this bug.