Bug 89171
Summary: | failed command: READ FPDMA QUEUED - Emask 0x4 (timeout) - Samsung XP941 MZHPU128 | ||
---|---|---|---|
Product: | IO/Storage | Reporter: | Dominik Mierzejewski (dominik) |
Component: | Serial ATA | Assignee: | Tejun Heo (tj) |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | am1n, sumitrai96, sven.koehler |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 3.17.4-200.fc20.x86_64 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
lspci -vnn
hdparm -I /dev/sda samsung-a800-nomsi.patch dmesg after applying Tejun's patch dmesg before applying Tejun's patch (ran with libata.force=noncq) |
Description
Dominik Mierzejewski
2014-12-02 12:31:57 UTC
Can you please post the output of "lspci -nn"? Thanks. I am also experiencing the similar issue. Please take a look. Description of problem: I am seeing these error messages in dmesg output and sometimes computer freezes. However this is a new ssd and SMART tests are passing. I am dual booting with another OS (Mac OS X) and don't have any issues on that OS. Version-Release number of selected component (if applicable): 3.17.4-300.fc21.x86_64 BOOT_IMAGE=/boot/vmlinuz-3.17.4-300.fc21.x86_64 root=/dev/mapper/fedora_20-root ro selinux=0 Please also take a look at Bug 1084928 which appears pretty similar. Dmesg: [ 824.591117] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x50000 action 0x6 frozen [ 824.591324] ata1: SError: { PHYRdyChg CommWake } [ 824.591442] ata1.00: failed command: IDLE [ 824.591553] ata1.00: cmd e3/00:e6:00:00:00/00:00:00:00:00/40 tag 23 res 40/00:d2:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) [ 824.591872] ata1.00: status: { DRDY } [ 824.591969] ata1: hard resetting link [ 824.896939] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 824.897620] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out [ 824.897819] ata1.00: supports DRM functions and may not be fully accessible [ 824.906648] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out [ 824.906840] ata1.00: supports DRM functions and may not be fully accessible [ 824.915151] ata1.00: configured for UDMA/133 [ 824.926007] ata1: EH complete [ 925.539433] ata1.00: exception Emask 0x0 SAct 0xc000000 SErr 0x50000 action 0x6 frozen [ 925.543577] ata1: SError: { PHYRdyChg CommWake } [ 925.547750] ata1.00: failed command: READ FPDMA QUEUED [ 925.551492] ata1.00: cmd 60/10:d0:18:c7:68/00:00:39:00:00/40 tag 26 ncq 8192 in res 40/00:fa:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) [ 925.559065] ata1.00: status: { DRDY } [ 925.562429] ata1.00: failed command: WRITE FPDMA QUEUED [ 925.565739] ata1.00: cmd 61/20:d8:b8:87:a5/00:00:39:00:00/40 tag 27 ncq 16384 out res 40/00:be:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) [ 925.572431] ata1.00: status: { DRDY } [ 925.575524] ata1: hard resetting link [ 925.883261] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 925.888396] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out [ 925.892562] ata1.00: supports DRM functions and may not be fully accessible [ 925.905285] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out [ 925.909043] ata1.00: supports DRM functions and may not be fully accessible [ 925.920859] ata1.00: configured for UDMA/133 [ 925.935261] ata1.00: device reported invalid CHS sector 0 [ 925.939878] ata1.00: device reported invalid CHS sector 0 [ 925.944296] ata1: EH complete [root@localhost ~]# smartctl -H /dev/sda smartctl 6.2 2014-07-16 r3952 [x86_64-linux-3.17.4-300.fc21.x86_64] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED Created attachment 159491 [details]
lspci -vnn
Created attachment 159501 [details]
hdparm -I /dev/sda
(In reply to Tejun Heo from comment #1) > Can you please post the output of "lspci -nn"? # lspci -nn 00:00.0 Host bridge [0600]: Intel Corporation Haswell-ULT DRAM Controller [8086:0a04] (rev 09) 00:02.0 VGA compatible controller [0300]: Intel Corporation Haswell-ULT Integrated Graphics Controller [8086:0a16] (rev 09) 00:03.0 Audio device [0403]: Intel Corporation Haswell-ULT HD Audio Controller [8086:0a0c] (rev 09) 00:14.0 USB controller [0c03]: Intel Corporation 8 Series USB xHCI HC [8086:9c31] (rev 04) 00:16.0 Communication controller [0780]: Intel Corporation 8 Series HECI #0 [8086:9c3a] (rev 04) 00:1b.0 Audio device [0403]: Intel Corporation 8 Series HD Audio Controller [8086:9c20] (rev 04) 00:1c.0 PCI bridge [0604]: Intel Corporation 8 Series PCI Express Root Port 3 [8086:9c14] (rev e4) 00:1c.3 PCI bridge [0604]: Intel Corporation 8 Series PCI Express Root Port 4 [8086:9c16] (rev e4) 00:1c.4 PCI bridge [0604]: Intel Corporation 8 Series PCI Express Root Port 6 [8086:9c1a] (rev e4) 00:1d.0 USB controller [0c03]: Intel Corporation 8 Series USB EHCI #1 [8086:9c26] (rev 04) 00:1f.0 ISA bridge [0601]: Intel Corporation 8 Series LPC Controller [8086:9c43] (rev 04) 00:1f.3 SMBus [0c05]: Intel Corporation 8 Series SMBus Controller [8086:9c22] (rev 04) 01:00.0 Network controller [0280]: Intel Corporation Wireless 7260 [8086:08b1] (rev 6b) 02:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS5209 PCI Express Card Reader [10ec:5209] (rev 01) 03:00.0 SATA controller [0106]: Samsung Electronics Co Ltd XP941 PCIe SSD [144d:a800] (rev 01) (In reply to Sumit Rai from comment #2) > I am also experiencing the similar issue. Please take a look. [...] > Dmesg: > [ 824.591117] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x50000 action 0x6 > frozen > [ 824.591324] ata1: SError: { PHYRdyChg CommWake } > [ 824.591442] ata1.00: failed command: IDLE > [ 824.591553] ata1.00: cmd e3/00:e6:00:00:00/00:00:00:00:00/40 tag 23 > res 40/00:d2:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) > [ 824.591872] ata1.00: status: { DRDY } > [ 824.591969] ata1: hard resetting link > [ 824.896939] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) > [ 824.897620] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) > filtered out > [ 824.897819] ata1.00: supports DRM functions and may not be fully > accessible > [ 824.906648] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) > filtered out > [ 824.906840] ata1.00: supports DRM functions and may not be fully > accessible > [ 824.915151] ata1.00: configured for UDMA/133 > [ 824.926007] ata1: EH complete This doesn't look similar at all, the error messages are completely different. Please open a separate bug report instead of introducing noise here. For the record, this was originally reported (by me) in Fedora as https://bugzilla.redhat.com/show_bug.cgi?id=1084928 . Created attachment 159561 [details]
samsung-a800-nomsi.patch
Can you please see whether the attached patch resolves the issue? Please post boot dmesg before and after the patch.
Thanks.
(In reply to Dominik Mierzejewski from comment #6) > (In reply to Sumit Rai from comment #2) > > I am also experiencing the similar issue. Please take a look. > [...] > > Dmesg: > > [ 824.591117] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x50000 action > 0x6 > > frozen > > [ 824.591324] ata1: SError: { PHYRdyChg CommWake } > > [ 824.591442] ata1.00: failed command: IDLE > > [ 824.591553] ata1.00: cmd e3/00:e6:00:00:00/00:00:00:00:00/40 tag 23 > > res 40/00:d2:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) > > [ 824.591872] ata1.00: status: { DRDY } > > [ 824.591969] ata1: hard resetting link > > [ 824.896939] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) > > [ 824.897620] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) > > filtered out > > [ 824.897819] ata1.00: supports DRM functions and may not be fully > > accessible > > [ 824.906648] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) > > filtered out > > [ 824.906840] ata1.00: supports DRM functions and may not be fully > > accessible > > [ 824.915151] ata1.00: configured for UDMA/133 > > [ 824.926007] ata1: EH complete > > This doesn't look similar at all, the error messages are completely > different. Please open a separate bug report instead of introducing noise > here. You are right, the part of output you have quoted doesn't look similiar. 1. However if you could please scroll down you will find " [ 925.539433] ata1.00: exception Emask 0x0 SAct 0xc000000 SErr 0x50000 action 0x6 frozen [ 925.543577] ata1: SError: { PHYRdyChg CommWake } [ 925.547750] ata1.00: failed command: READ FPDMA QUEUED [ 925.551492] ata1.00: cmd 60/10:d0:18:c7:68/00:00:39:00:00/40 tag 26 ncq 8192 in res 40/00:fa:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) [ 925.559065] ata1.00: status: { DRDY } .................. [ 925.920859] ata1.00: configured for UDMA/133 [ 925.935261] ata1.00: device reported invalid CHS sector 0 [ 925.939878] ata1.00: device reported invalid CHS sector 0 [ 925.944296] ata1: EH complete" which looks pretty similar to the error Mr. Dominik Mierzejewski is getting. 2. I looked at your patch, and got the idea to reboot my machine with pci=nomsi kernel parameter. It seems to fix the issue for now, I am no longer getting the error messages. Thanks. 3. My machine and Mr. Dominik's machine, both are using ssds. I have not seen this issue with different storage media in same setup. 4. pci=nomsi not only fixed READ FPDMA QUEUED. It also fixed the error message qutoed by you i.e. [ 824.591324] ata1: SError: { PHYRdyChg CommWake } [ 824.591442] ata1.00: failed command: IDLE. Point number 1, 2, 3, and 4 leads me to believe it's same or similar issue. If it is the same issue it would be redundent to file a new bug report. If you still believe it's a different issue, I will file a new bug report. However, it you believe it's the same. Since pci=nomsi disables MSI interrupts system wide. It would help to have a patch with PCI-ID of my system SATA controller only. To that end, I have already attached output of lspci -vnn in my earlier comment. I am still getting, that I agree is a different issue and should not be discussed in this bug report. [ 3.043884] ata2.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out [ 3.044721] ata2.00: ATAPI: MATSHITADVD-R UJ-8A8, HA13, max UDMA/100 [ 3.048037] ata2.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out [ 3.048863] ata2.00: configured for UDMA/100 pci=nomsi helps here as well. I'm building a kernel with Tejun's patch from comment #8 right now. (In reply to Tejun Heo from comment #8) > Created attachment 159561 [details] > samsung-a800-nomsi.patch > > Can you please see whether the attached patch resolves the issue? Please > post boot dmesg before and after the patch. It does. Created attachment 159691 [details]
dmesg after applying Tejun's patch
Created attachment 159701 [details]
dmesg before applying Tejun's patch (ran with libata.force=noncq)
Out of curiosity, what are the downsides of disabling MSI for my disk/controller vs. disabling NCQ? W/o NCQ, the drive can only process one command at a time which can show up as noticeable performance degradation on certain workloads. Disabling MSI shouldn't cause any noticeable difference. It may lead to marginal increase in CPU consumption but it should be miniscule. Sumit, your issue is entirely different. It's a different controller leading to different errors under different circumstances. Can you please open a new bug report? This MSI/NCQ issue has already been reported on a similar samsung device, but I don't want to blanket disable MSI on an intel chipset without knowing what's going on. Thanks. Patch posted and applied. http://lkml.kernel.org/g/20141204181959.GB4080@htj.dyndns.org Thanks. (In reply to Tejun Heo from comment #15) > Sumit, your issue is entirely different. It's a different controller leading > to different errors under different circumstances. Can you please open a new > bug report? This MSI/NCQ issue has already been reported on a similar > samsung device, but I don't want to blanket disable MSI on an intel chipset > without knowing what's going on. > > Thanks. Thanks for clarifying. I have not been able to reproduce this issue after removing virtualbox (vboxdrv) which taints the kernel. I will file a new bug if issue occura again in untainted kernel. Sorry for inconvenience. TWIMC: Filed a new bug: https://bugzilla.kernel.org/show_bug.cgi?id=89261. |