Bug 12975
Summary: | ata2: lots of error messages, but OS still running well | ||
---|---|---|---|
Product: | IO/Storage | Reporter: | Peter Teoh (htmldeveloper) |
Component: | Serial ATA | Assignee: | Jeff Garzik (jgarzik) |
Status: | CLOSED UNREPRODUCIBLE | ||
Severity: | normal | CC: | akpm, rjw, thomas.creutz, tj |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.29 (linus tree) | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 12398 | ||
Attachments: |
messages.log
sense-debug.patch full dmesg |
Description
Peter Teoh
2009-03-30 08:03:33 UTC
Reassigned to ata, marked as a regression. Strange. Can you please attach full kernel log including boot messages? Sorry, this interface does not allow more than 65535 characters, so here it is: http://pastebin.com/f218b3add reboot over two different days. Thanks. Created attachment 20836 [details]
messages.log
It's usually better to attach log files. Fetching the log and attaching.
Created attachment 20971 [details]
sense-debug.patch
Can you please apply the attached patch and post the resulting kernel log? Thanks.
Thank you Tejun Heo. After you patch came out, i was frantically trying to duplicate the bug again, including rebooting into the same past kernel, all on the same machine. To no avail. Both past version and current 2.6.30-rc1 linus git tree version has no such "ata2....." kind of error statement in dmesg. Question: Is this error possible arising from instrumenting other parts of the kernel codes? (eg, the vfs_read() interface)? # Mar 30 11:13:21 coffeegold kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 # Mar 30 11:13:21 coffeegold kernel: ata2.00: cmd a0/00:00:00:00:00/00:00:00:00:00/a0 tag 0 # Mar 30 11:13:21 coffeegold kernel: cdb 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 # Mar 30 11:13:21 coffeegold kernel: res 51/20:03:00:00:00/00:00:00:00:00/a0 Emask 0x1 (device error) # Mar 30 11:13:21 coffeegold kernel: ata2.00: status: { DRDY ERR } # Mar 30 11:13:26 coffeegold kernel: ata2.00: qc timeout (cmd 0xa0) # Mar 30 11:13:26 coffeegold kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen # Mar 30 11:13:26 coffeegold kernel: ata2.00: cmd a0/00:00:00:00:00/00:00:00:00:00/a0 tag 0 # Mar 30 11:13:26 coffeegold kernel: cdb 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 # Mar 30 11:13:26 coffeegold kernel: res 51/20:03:00:00:00/00:00:00:00:00/a0 Emask 0x5 (timeout) # Mar 30 11:13:26 coffeegold kernel: ata2.00: status: { DRDY ERR } # Mar 30 11:13:31 coffeegold kernel: ata2: link is slow to respond, please be patient (ready=0) # Mar 30 11:13:35 coffeegold dhclient: DHCPREQUEST on eth0 to 192.168.0.20 port 67 # Mar 30 11:13:35 coffeegold dhclient: DHCPACK from 192.168.0.20 # Mar 30 11:13:35 coffeegold dhclient: bound to 192.168.128.245 -- renewal in 257 seconds. # Mar 30 11:13:36 coffeegold kernel: ata2: device not ready (errno=-16), forcing hardreset # Mar 30 11:13:36 coffeegold kernel: ata2: soft resetting link # Mar 30 11:13:36 coffeegold kernel: ata2.00: configured for UDMA/33 # Mar 30 11:13:36 coffeegold kernel: ata2: EH complete # Mar 30 11:17:14 coffeegold kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 # Mar 30 11:17:14 coffeegold kernel: ata2.00: BMDMA stat 0x25 # Mar 30 11:17:14 coffeegold kernel: ata2.00: cmd a0/01:00:00:fe:00/00:00:00:00:00/a0 tag 0 dma 16640 in # Mar 30 11:17:14 coffeegold kernel: cdb 12 01 00 00 fe 00 00 00 00 00 00 00 00 00 00 00 # Mar 30 11:17:14 coffeegold kernel: res d1/54:03:00:00:00/00:00:00:00:00/a0 Emask 0x3 (HSM violation) # Mar 30 11:17:14 coffeegold kernel: ata2.00: status: { Busy } # Mar 30 11:17:14 coffeegold kernel: ata2: soft resetting link # Mar 30 11:17:14 coffeegold kernel: ata2.00: configured for UDMA/33 # Mar 30 11:17:14 coffeegold kernel: ata2: EH complete As I was reading this drivers/ata/libata-eh.c: 546 /* synchronize with host lock and sort out timeouts */ 547 548 /* For new EH, all qcs are finished in one of three ways - 549 * normal completion, error completion, and SCSI timeout. 550 * Both completions can race against SCSI timeout. When normal 551 * completion wins, the qc never reaches EH. When error 552 * completion wins, the qc has ATA_QCFLAG_FAILED set. 553 * 554 * When SCSI timeout wins, things are a bit more complex. 555 * Normal or error completion can occur after the timeout but 556 * before this point. In such cases, both types of 557 * completions are honored. A scmd is determined to have 558 * timed out iff its associated qc is active and not failed. 559 */ I suspect it may be due to a slight increase in overheads in processing the instrumented vfs_read(), giving rise to timing delays. Correct? Apologize for the effort spent. U can close this bug report. Thanks. Hmm.... I don't think it's something which depends on timing but you never know. Well, please keep an eye on it and if it occurs again, reopen this bug report. Thanks. hi.. I have also this messages on my system (running current Ubuntu 9.04 with kernel image 2.6.28-13-generic, but also on Ubuntu 8.10 I had this problems) I have also very bad performance when I have heavy harddisk activity (copy files, running vmware....) I try to recompile the module and attach the log Created attachment 22293 [details]
full dmesg
Thomas, can you please file a separate bug report and attach results from 2.6.30? Thankks. hm.. ok. |