Bug 7979
Summary: | 2.6.20-rt2: high prio task yields journal commit I/O error | ||
---|---|---|---|
Product: | Alternate Trees | Reporter: | Robert Crocombe (rcrocomb) |
Component: | rt | Assignee: | alt-trees_rt |
Status: | REJECTED INVALID | ||
Severity: | normal | ||
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.20-rt2 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: |
dmesg
Kernel config Kernel config for 2.6.20-rt8 dmesg for 2.6.20-rt8 |
Description
Robert Crocombe
2007-02-09 15:35:52 UTC
Created attachment 10373 [details]
dmesg
Created attachment 10374 [details]
Kernel config
Reply-To: akpm@linux-foundation.org On Fri, 9 Feb 2007 15:45:31 -0800 bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=7979 > > Summary: 2.6.20-rt2: high prio task yields journal commit I/O > error Well you've been having fun there. I'd say that you've hit two (maybe 1.5) bugs in the scsi code: - I assume that your high-priority task has starved a scsi kernel thread for so long that when that thread finally got control, it decided that something had timed out and declared an error. Maybe. Or perhaps the card decided that it hadn't been serviced for so long that it declared an error. It would need someone who is familiar with scsi and aic7xxx to determine that. - In response to the timeout, aic7xxx error handling went and passed crap into the scatter/gather unmapping code and the kernel oopsed. Frankly, I doubt if either of these things (or at least, the first one) are likely to be fixed in a hurry and I'd suggest that you look at continuing your work on (say) a SATA or IDE machine, sorry. Dug up a PATA disk for the same machine and installed Fedora Core 5 (as before). Resumed testing with a new kernel based on 2.6.20-rt8 (<- slightly newer than previously). The config is also a bit different since I have added oprofile support and turned on a few of Ingo's -rt debugging features to kind of poke at them and see if I could use them to figure out where the larger delays are coming from (uhm, not as yet). It's perhaps a bit better, but the problems are still there. From the serial console: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata1.00: cmd ca/00:08:d3:db:38/00:00:00:00:00/e1 tag 0 cdb 0x0 data 4096 out res 40/00:00:00:00:00/00:00:00:00:00/e0 Emask 0x4 (timeout) ata1.00: qc timeout (cmd 0xef) ata1.00: failed to set xfermode (err_mask=0x4) ata1.00: limiting speed to UDMA/44 ata1: failed to recover some devices, retrying in 5 secs ata1.00: qc timeout (cmd 0xef) ata1.00: failed to set xfermode (err_mask=0x4) ata1.00: limiting speed to PIO0 ata1: failed to recover some devices, retrying in 5 secs ata1.00: qc timeout (cmd 0xef) ata1.00: failed to set xfermode (err_mask=0x4) ata1.00: disabled end_request: I/O error, dev sda, sector 20503507 Buffer I/O error on device sda3, logical block 2579 lost page write due to I/O error on sda3 Aborting journal on device sda3. end_request: I/O error, dev sda, sector 59281459 Buffer I/O error on device sda3, logical block 4849823 lost page write due to I/O error on sda3 end_request: I/O error, dev sda, sector 59281555 Buffer I/O error on device sda3, logical block 4849835 lost page write due to I/O error on sda3 end_request: I/O error, dev sda, sector 59281435 Buffer I/O error on device sda3, logical block 4849820 lost page write due to I/O error on sda3 end_request: I/O error, dev sda, sector 59280187 Buffer I/O error on device sda3, logical block 4849664 lost page write due to I/O error on sda3 Buffer I/O error on device sda3, logical block 4849665 lost page write due to I/O error on sda3 end_request: I/O error, dev sda, sector 59364323 Buffer I/O error on device sda3, logical block 4860181 lost page write due to I/O error on sda3 end_request: I/O error, dev sda, sector 59396603 Buffer I/O error on device sda3, logical block 4864216 lost page write due to I/O error on sda3 end_request: I/O error, dev sda, sector 59396987 Buffer I/O error on device sda3, logical block 4864264 lost page write due to I/O error on sda3 Buffer I/O error on device sda3, logical block 4864265 lost page write due to I/O error on sda3 end_request: I/O error, dev sda, sector 59504435 end_request: I/O error, dev sda, sector 60066619 end_request: I/O error, dev sda, sector 84012738 end_request: I/O error, dev sda, sector 84012978 end_request: I/O error, dev sda, sector 84013098 end_request: I/O error, dev sda, sector 84013178 end_request: I/O error, dev sda, sector 84799786 end_request: I/O error, dev sda, sector 84799810 end_request: I/O error, dev sda, sector 84799858 end_request: I/O error, dev sda, sector 84799890 end_request: I/O error, dev sda, sector 84799978 end_request: I/O error, dev sda, sector 84800026 end_request: I/O error, dev sda, sector 85325298 end_request: I/O error, dev sda, sector 85325314 end_request: I/O error, dev sda, sector 85325402 end_request: I/O error, dev sda, sector 94236554 end_request: I/O error, dev sda, sector 95022770 end_request: I/O error, dev sda, sector 95022962 end_request: I/O error, dev sda, sector 95023026 end_request: I/O error, dev sda, sector 95023058 end_request: I/O error, dev sda, sector 95023122 end_request: I/O error, dev sda, sector 95023218 end_request: I/O error, dev sda, sector 104984242 end_request: I/O error, dev sda, sector 122285690 end_request: I/O error, dev sda, sector 122286754 end_request: I/O error, dev sda, sector 133367024 end_request: I/O error, dev sda, sector 63 end_request: I/O error, dev sda, sector 1315863 end_request: I/O error, dev sda, sector 12058703 end_request: I/O error, dev sda, sector 14417983 end_request: I/O error, dev sda, sector 14565439 end_request: I/O error, dev sda, sector 14680143 end_request: I/O error, dev sda, sector 15728727 end_request: I/O error, dev sda, sector 15728823 end_request: I/O error, dev sda, sector 15728903 end_request: I/O error, dev sda, sector 15728975 end_request: I/O error, dev sda, sector 15728983 end_request: I/O error, dev sda, sector 15729095 end_request: I/O error, dev sda, sector 15729159 end_request: I/O error, dev sda, sector 20482875 end_request: I/O error, dev sda, sector 20482891 end_request: I/O error, dev sda, sector 133367024 EXT3-fs error (device sda6): ext3_get_inode_loc: unable to read inode block - inode=1296651, block=1310722 end_request: I/O error, dev sda, sector 122910248 end_request: I/O error, dev sda, sector 122881248 EXT3-fs error (device sda6) in ext3_reserve_inode_write: IO failure end_request: I/O error, dev sda, sector 122881248 end_request: I/O error, dev sda, sector 15728975 EXT3-fs error (device sda1): ext3_get_inode_loc: unable to read inode block - inode=1946002, block=1966114 end_request: I/O error, dev sda, sector 63 EXT3-fs error (device sda1) in ext3_reserve_inode_write: IO failure end_request: I/O error, dev sda, sector 63 end_request: I/O error, dev sda, sector 25319 end_request: I/O error, dev sda, sector 25367 Aborting journal on device sda1. end_request: I/O error, dev sda, sector 122910272 Aborting journal on device sda6. sda : READ CAPACITY failed. sda : status=0, message=00, host=4, driver=00 sda : sense not available. sda: Write Protect is off sda: asking for cache data failed sda: assuming drive cache: write through ext3_abort called. EXT3-fs error (device sda6): ext3_journal_start_sb: Detected aborted journal Remounting filesystem read-only printk: 72 messages suppressed. Buffer I/O error on device sda6, logical block 1048578 lost page write due to I/O error on sda6 Buffer I/O error on device sda6, logical block 1310722 lost page write due to I/O error on sda6 Buffer I/O error on device sda1, logical block 164469 lost page write due to I/O error on sda1 Buffer I/O error on device sda1, logical block 164470 lost page write due to I/O error on sda1 Buffer I/O error on device sda1, logical block 164473 lost page write due to I/O error on sda1 Buffer I/O error on device sda1, logical block 164477 lost page write due to I/O error on sda1 ext3_abort called. EXT3-fs error (device sda1): ext3_journal_start_sb: Detected aborted journal Remounting filesystem read-only EXT3-fs error (device sda1): ext3_find_entry: reading directory #1782881 offset 0 ext3_abort called. EXT3-fs error (device sda3): ext3_journal_start_sb: Detected aborted journal Remounting filesystem read-only printk: 22 messages suppressed. Buffer I/O error on device sda5, logical block 3071 lost page write due to I/O error on sda5 Aborting journal on device sda5. ext3_abort called. EXT3-fs error (device sda5): ext3_journal_start_sb: Detected aborted journal Remounting filesystem read-only Buffer I/O error on device sda5, logical block 1343561 lost page write due to I/O error on sda5 Buffer I/O error on device sda5, logical block 1638410 lost page write due to I/O error on sda5 Buffer I/O error on device sda5, logical block 4784130 lost page write due to I/O error on sda5 Buffer I/O error on device sda5, logical block 4915280 lost page write due to I/O error on sda5 this is with the ohci1394 interrupt chrt'd up to 90 and the task's priority set to 90 as well. Created attachment 10585 [details]
Kernel config for 2.6.20-rt8
Created attachment 10586 [details]
dmesg for 2.6.20-rt8
I didn't realize how the 1394 driver was setup to handle isochronous transmit, and used code based around two threads that I originally used for asynchronous receive. I figured that the read() w/in raw1394_loop_iterate() would unblock once per cycle start or something (8k/second), but no, it was instead a straight poll, which led to me making calls into raw1394_ioctl at the rate of about 450,000 calls per second. After due consideration, I have this down to 1 per packet (1200/second), and things are working swimmingly. Mea culpa. Sorry for the noise. |