Most recent kernel where this bug did *NOT* occur: 2.6.16-rt29 Distribution: Fedora Core 5 Hardware Environment: 4 processor IWill H8502: 2.8GHz Opterons, 10GB of RAM Software Environment: Problem Description: I'm trying to transmit 1394b isochronous data @ 1200Hz without packet buffering: the data contents are generated in each 833us interval and the interrupts that the packets generate on the receiver of the isochronous data drive the timing for an entire subsystem (so regularity is a must). I figure out to which IRQ the particuar 1394b port I'm using is going (by running a short version of the test and seeing where the interrupts are generated in /proc/interrupts), and then I do a chrt -p 99 `pidof 'IRQ-XYZ'`. The process itself does an mlockall() and uses sched_setscheduler to set its own priority to 99. I also tried a little lower, with a prio of 96 for both process and IRQ, in case I was just really really starving the kernel, but it seems to make no difference. w/ 2.6.16-rt29 I was getting mysterious gaps of almost exactly 100ms periodically. I was hoping a more recent kernel might improve things, and in fact it largely does work better, but it also explodes (usually). My goal is to be able to go 300 seconds and not lose a packet. Sometimes I can, and sometimes I get this on the console: Message from syslogd@bubba at Fri Feb 9 16:15:13 2007 ... bubba kernel: journal commit I/O error Before this happens, I see some part of the following on the serial console: scsi1: Device reset code sleeping e reset code sleepingimer expired ( 1) scsi1: Device reset timer expired (active 1) scsi1: Device reset returning 0x2003 Recovery SCB completes Recovery SCB completes scsi1: At time of recovery, card was paused and then presumably things progress and I get the rest of that plus this: >>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<< scsi1: Dumping Card State at program address 0x7 Mode 0x33 Card was paused INTSTAT[0x0] SELOID[0x0] SELID[0x10] HS_MAILBOX[0x0] INTCTL[0x80] SEQINTSTAT[0x0] SAVED_MODE[0x11] DFFSTAT[0x33] SCSISIGI[0x0] SCSIPHASE[0x0] SCSIBUS[0x0] LASTPHASE[0x1] SCSISEQ0[0x0] SCSISEQ1[0x12] SEQCTL0[0x0] SEQINTCTL[0x0] SEQ_FLAGS[0x0] SEQ_FLAGS2[0x4] QFREEZE_COUNT[0x2] KERNEL_QFREEZE_COUNT[0x2] MK_MESSAGE_SCB[0xff00] MK_MESSAGE_SCSIID[0xff] SSTAT0[0x0] SSTAT1[0x0] SSTAT2[0x0] SSTAT3[0x0] PERRDIAG[0x0] SIMODE1[0xa4] LQISTAT0[0x0] LQISTAT1[0x0] LQISTAT2[0x0] LQOSTAT0[0x0] LQOSTAT1[0x0] LQOSTAT2[0xe1] SCB Count = 64 CMDS_PENDING = 0 LASTSCB 0x3c CURRSCB 0x2 NEXTSCB 0xffc0 qinstart = 20024 qinfifonext = 20025 QINFIFO: 0x2 WAITING_TID_QUEUES: Pending list: 2 FIFO_USE[0x0] SCB_CONTROL[0x0] SCB_SCSIID[0x7] Total 1 Kernel Free SCB list: 43 60 23 30 38 5 15 59 17 44 35 3 45 62 8 48 26 41 0 52 12 46 19 21 31 32 36 25 54 34 37 53 58 27 4 40 20 33 9 6 29 42 50 1 11 22 24 16 18 47 61 56 63 28 10 14 51 39 55 57 49 7 13 Sequencer Complete DMA-inprog list: Sequencer Complete list: Sequencer DMA-Up and Complete list: Sequencer On QFreeze and Complete list: scsi1: FIFO0 Free, LONGJMP == 0x8072, SCB 0x2 SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x4] DFSTATUS[0x89] SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0] SOFFCNT[0x0] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0 HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10] scsi1: FIFO1 Free, LONGJMP == 0x8063, SCB 0x3 SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89] SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0] SOFFCNT[0x0] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0 HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10] LQIN: 0x8 0x0 0x0 0x2 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 scsi1: LQISTATE = 0x0, LQOSTATE = 0x0, OPTIONMODE = 0x52 scsi1: OS_SPACE_CNT = 0x20 MAXCMDCNT = 0x1 scsi1: SAVED_SCSIID = 0x0 SAVED_LUN = 0x0 SIMODE0[0xc] CCSCBCTL[0x4] scsi1: REG0 == 0x1d, SINDEX = 0x100, DINDEX = 0x102 scsi1: SCBPTR == 0x2, SCB_NEXT == 0xffc0, SCB_NEXT2 == 0xffc4 CDB 0 0 0 0 0 0 STACK: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 <<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>> scsi1:0:0:0: Cmd aborted from QINFIFO scsi1: At time of recovery, card was paused >>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<< scsi1: Dumping Card State at program address 0x12b Mode 0x33 Card was paused INTSTAT[0x0] SELOID[0x1] SELID[0x10] HS_MAILBOX[0x0] INTCTL[0x80] SEQINTSTAT[0x0] SAVED_MODE[0x11] DFFSTAT[0x33] SCSISIGI[0xb6] SCSIPHASE[0x4] SCSIBUS[0xc0] LASTPHASE[0xa0] SCSISEQ0[0x0] SCSISEQ1[0x12] SEQCTL0[0x0] SEQINTCTL[0x0] SEQ_FLAGS[0x40] SEQ_FLAGS2[0x4] QFREEZE_COUNT[0x2] KERNEL_QFREEZE_COUNT[0x2] MK_MESSAGE_SCB[0xff00] MK_MESSAGE_SCSIID[0xff] SSTAT0[0x2] SSTAT1[0x1] SSTAT2[0x0] SSTAT3[0x0] PERRDIAG[0x0] SIMODE1[0xac] LQISTAT0[0x0] LQISTAT1[0x0] LQISTAT2[0x0] LQOSTAT0[0x0] LQOSTAT1[0x0] LQOSTAT2[0x0] SCB Count = 64 CMDS_PENDING = 1 LASTSCB 0x3c CURRSCB 0x2 NEXTSCB 0xffc0 qinstart = 20025 qinfifonext = 20025 QINFIFO: WAITING_TID_QUEUES: Pending list: 2 FIFO_USE[0x0] SCB_CONTROL[0x62] SCB_SCSIID[0x17] Total 1 Kernel Free SCB list: 43 60 23 30 38 5 15 59 17 44 35 3 45 62 8 48 26 41 0 52 12 46 19 21 31 32 36 25 54 34 37 53 58 27 4 40 20 33 9 6 29 42 50 1 11 22 24 16 18 47 61 56 63 28 10 14 51 39 55 57 49 7 13 Sequencer Complete DMA-inprog list: Sequencer Complete list: Sequencer DMA-Up and Complete list: Sequencer On QFreeze and Complete list: scsi1: FIFO0 Free, LONGJMP == 0x8072, SCB 0x2 SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x4] DFSTATUS[0x89] SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0] SOFFCNT[0x0] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0 HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10] scsi1: FIFO1 Free, LONGJMP == 0x8063, SCB 0x3 SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89] SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0] SOFFCNT[0x0] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0 HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10] LQIN: 0x8 0x0 0x0 0x2 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 scsi1: LQISTATE = 0x0, LQOSTATE = 0x0, OPTIONMODE = 0x52 scsi1: OS_SPACE_CNT = 0x20 MAXCMDCNT = 0x1 scsi1: SAVED_SCSIID = 0x0 SAVED_LUN = 0x0 SIMODE0[0xc] CCSCBCTL[0x4] scsi1: REG0 == 0x1d, SINDEX = 0x107, DINDEX = 0x104 scsi1: SCBPTR == 0x2, SCB_NEXT == 0xff00, SCB_NEXT2 == 0xffc4 CDB 0 0 0 0 0 0 STACK: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 <<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>> scsi1: Recovery code sleeping scsi1: Timer Expired (active 1) Recovery code awake scsi1: Command abort returning 0x2003 sd 1:0:0:0: rejecting I/O to offline device md: super_written gets error=-5, uptodate=0 raid1: Disk failure on sda1, disabling device. Operation continuing on 1 devices end_request: I/O error, dev sdb, sector 575673023 md: super_written gets error=-5, uptodate=0 sd 1:0:1:0: rejecting I/O to offline device sd 1:0:1:0: rejecting I/O to offline device md: super_written gets error=-5, uptodate=0 RAID1 conf printout: --- wd:1 rd:2 disk 0, wo:1, o:0, dev:sda1 disk 1, wo:0, o:1, dev:sdb1 RAID1 conf printout: stopped custom tracer. Unable to handle kernel NULL pointer dereference at 0000000000000018 RIP: [<ffffffff8026a8d6>] gart_unmap_sg+0x18/0x4d PGD 18019d067 PUD 27921e067 PMD 0 Oops: 0000 [1] PREEMPT SMP CPU 3 Modules linked in: raw1394 ohci1394 ieee1394 tg3 Pid: 497, comm: IRQ-89 Not tainted 2.6.20-rt2_02 #2 RIP: 0010:[<ffffffff8026a8d6>] [<ffffffff8026a8d6>] gart_unmap_sg+0x18/0x4d RSP: 0000:ffff8100cbf3fca0 EFLAGS: 00010297 RAX: ffffffff804e1780 RBX: 0000000000000000 RCX: 0000000000000001 RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8101803d0870 RBP: 0000000000000000 R08: 0000000000000001 R09: 000000000000ff00 R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000000001 R13: 0000000000000001 R14: ffff8101803d0870 R15: 0000000000000001 FS: 00002acba9a57140(0000) GS:ffff8101801007c0(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000018 CR3: 0000000278d9e000 CR4: 00000000000006e0 Process IRQ-89 (pid: 497, threadinfo ffff8100cbf3e000, task ffff8100cbc4a0c0) Stack: ffff810081d14000 ffff8100cbf08b00 ffff8100cbe955f8 0000000000000000 0000000000000002 ffffffff80376f41 ffff810081d14000 ffff810277619180 0000000481d14000 0000000000000002 0000000000000003 000000038036a150 Call Trace: [<ffffffff80376f41>] ahd_done+0xe9/0x58d [<ffffffff80367e6e>] ahd_abort_scbs+0x281/0x32a [<ffffffff8036f2f5>] ahd_handle_scsiint+0x1470/0x17b5 [<ffffffff80377b1e>] ahd_linux_isr+0x16d/0x18a [<ffffffff8020f65a>] handle_IRQ_event+0x50/0xdf [<ffffffff8029990d>] thread_simple_irq+0x44/0x7a [<ffffffff802889cb>] keventd_create_kthread+0x0/0x6d [<ffffffff8029a0fc>] do_irqd+0x10a/0x32c [<ffffffff80299ff2>] do_irqd+0x0/0x32c [<ffffffff802889cb>] keventd_create_kthread+0x0/0x6d [<ffffffff8022f6a4>] kthread+0xd1/0x100 [<ffffffff80257b08>] child_rip+0xa/0x12 [<ffffffff802889cb>] keventd_create_kthread+0x0/0x6d [<ffffffff8022f5d3>] kthread+0x0/0x100 [<ffffffff80257afe>] child_rip+0x0/0x12 Code: 8b 43 18 48 89 da 85 c0 74 22 48 83 c3 20 83 7a 0c 00 74 18 RIP [<ffffffff8026a8d6>] gart_unmap_sg+0x18/0x4d RSP <ffff8100cbf3fca0> CR2: 0000000000000018 --- wd:1 rd:2 disk 1, wo:0, o:1, dev:sdb1 sd 1:0:1:0: rejecting I/O to offline device sd 1:0:1:0: rejecting I/O to offline device Buffer I/O error on device md0, logical block 7672399 lost page write due to I/O error on md0 sd 1:0:1:0: rejecting I/O to offline device Buffer I/O error on device md0, logical block 29287747 lost page write due to I/O error on md0 sd 1:0:1:0: rejecting I/O to offline device Buffer I/O error on device md0, logical block 8581 lost page write due to I/O error on md0 Aborting journal on device md0. ext3_abort called. journal commit I/O error ext3_abort called. EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal Remounting filesystem read-only EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal sd 1:0:1:0: rejecting I/O to offline device sd 1:0:1:0: rejecting I/O to offline device sd 1:0:1:0: rejecting I/O to offline device sd 1:0:1:0: rejecting I/O to offline device sd 1:0:1:0: rejecting I/O to offline device sd 1:0:1:0: rejecting I/O to offline device sd 1:0:1:0: rejecting I/O to offline device sd 1:0:1:0: rejecting I/O to offline device sd 1:0:1:0: rejecting I/O to offline device md: super_written gets error=-5, uptodate=0 sd 1:0:1:0: rejecting I/O to offline device sd 1:0:1:0: rejecting I/O to offline device md: super_written gets error=-5, uptodate=0 sd 1:0:1:0: rejecting I/O to offline device Buffer I/O error on device md0, logical block 3538082 lost page write due to I/O error on md0 sd 1:0:1:0: rejecting I/O to offline device Buffer I/O error on device md0, logical block 3538084 lost page write due to I/O error on md0 sd 1:0:1:0: rejecting I/O to offline device Buffer I/O error on device md0, logical block 7502089 lost page write due to I/O error on md0 sd 1:0:1:0: rejecting I/O to offline device Buffer I/O error on device md0, logical block 7633082 lost page write due to I/O error on md0 sd 1:0:1:0: rejecting I/O to offline device Buffer I/O error on device md0, logical block 7633083 lost page write due to I/O error on md0 sd 1:0:1:0: rejecting I/O to offline device Buffer I/O error on device md0, logical block 7633086 lost page write due to I/O error on md0 sd 1:0:1:0: rejecting I/O to offline device Buffer I/O error on device md0, logical block 7633089 lost page write due to I/O error on md0 sd 1:0:1:0: rejecting I/O to offline device Buffer I/O error on device md0, logical block 21850922 lost page write due to I/O error on md0 sd 1:0:1:0: rejecting I/O to offline device Buffer I/O error on device md0, logical block 21850923 lost page write due to I/O error on md0 sd 1:0:1:0: rejecting I/O to offline device Buffer I/O error on device md0, logical block 21850964 lost page write due to I/O error on md0 sd 1:0:1:0: rejecting I/O to offline device sd 1:0:1:0: rejecting I/O to offline device sd 1:0:1:0: rejecting I/O to offline device sd 1:0:1:0: rejecting I/O to offline device sd 1:0:1:0: rejecting I/O to offline device sd 1:0:1:0: rejecting I/O to offline device sd 1:0:1:0: rejecting I/O to offline device sd 1:0:1:0: rejecting I/O to offline device sd 1:0:1:0: rejecting I/O to offline device sd 1:0:1:0: rejecting I/O to offline device sd 1:0:1:0: rejecting I/O to offline device sd 1:0:1:0: rejecting I/O to offline device md: super_written gets error=-5, uptodate=0 at which point I can't write anything to disk, etc. so it's Sysrq-S, Sysrq-U, Sysrq-T, Sysrq-L, Sysrq-B time. Steps to reproduce: Oh boy. Simplest would be two threads. First set priority, locking etc, and then isochronous transmit via libraw1394. Fork a thread which calls raw1394_loop_iterate(). The other thread right now simply does a busy-wait for 1/1200th of a second and then does a write that updates a little state. The 1st thread's continuous calling of raw1394_loop_iterate() results in the kernel calling a registered iso xmit callback function, which notices that a write has occured and assigns things as necessary within the callback to transmit (in my case) 48 bytes and return RAW1394_ISO_OK. If the callback finds that no write has occurred since the last transmit it returns with RAW1394_ISO_AGAIN (and nothing is output). Usually the thread that is busy-waiting will be running a simulation. If necessary, I can probably cook something up.
Created attachment 10373 [details] dmesg
Created attachment 10374 [details] Kernel config
Reply-To: akpm@linux-foundation.org On Fri, 9 Feb 2007 15:45:31 -0800 bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=7979 > > Summary: 2.6.20-rt2: high prio task yields journal commit I/O > error Well you've been having fun there. I'd say that you've hit two (maybe 1.5) bugs in the scsi code: - I assume that your high-priority task has starved a scsi kernel thread for so long that when that thread finally got control, it decided that something had timed out and declared an error. Maybe. Or perhaps the card decided that it hadn't been serviced for so long that it declared an error. It would need someone who is familiar with scsi and aic7xxx to determine that. - In response to the timeout, aic7xxx error handling went and passed crap into the scatter/gather unmapping code and the kernel oopsed. Frankly, I doubt if either of these things (or at least, the first one) are likely to be fixed in a hurry and I'd suggest that you look at continuing your work on (say) a SATA or IDE machine, sorry.
Dug up a PATA disk for the same machine and installed Fedora Core 5 (as before). Resumed testing with a new kernel based on 2.6.20-rt8 (<- slightly newer than previously). The config is also a bit different since I have added oprofile support and turned on a few of Ingo's -rt debugging features to kind of poke at them and see if I could use them to figure out where the larger delays are coming from (uhm, not as yet). It's perhaps a bit better, but the problems are still there. From the serial console: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata1.00: cmd ca/00:08:d3:db:38/00:00:00:00:00/e1 tag 0 cdb 0x0 data 4096 out res 40/00:00:00:00:00/00:00:00:00:00/e0 Emask 0x4 (timeout) ata1.00: qc timeout (cmd 0xef) ata1.00: failed to set xfermode (err_mask=0x4) ata1.00: limiting speed to UDMA/44 ata1: failed to recover some devices, retrying in 5 secs ata1.00: qc timeout (cmd 0xef) ata1.00: failed to set xfermode (err_mask=0x4) ata1.00: limiting speed to PIO0 ata1: failed to recover some devices, retrying in 5 secs ata1.00: qc timeout (cmd 0xef) ata1.00: failed to set xfermode (err_mask=0x4) ata1.00: disabled end_request: I/O error, dev sda, sector 20503507 Buffer I/O error on device sda3, logical block 2579 lost page write due to I/O error on sda3 Aborting journal on device sda3. end_request: I/O error, dev sda, sector 59281459 Buffer I/O error on device sda3, logical block 4849823 lost page write due to I/O error on sda3 end_request: I/O error, dev sda, sector 59281555 Buffer I/O error on device sda3, logical block 4849835 lost page write due to I/O error on sda3 end_request: I/O error, dev sda, sector 59281435 Buffer I/O error on device sda3, logical block 4849820 lost page write due to I/O error on sda3 end_request: I/O error, dev sda, sector 59280187 Buffer I/O error on device sda3, logical block 4849664 lost page write due to I/O error on sda3 Buffer I/O error on device sda3, logical block 4849665 lost page write due to I/O error on sda3 end_request: I/O error, dev sda, sector 59364323 Buffer I/O error on device sda3, logical block 4860181 lost page write due to I/O error on sda3 end_request: I/O error, dev sda, sector 59396603 Buffer I/O error on device sda3, logical block 4864216 lost page write due to I/O error on sda3 end_request: I/O error, dev sda, sector 59396987 Buffer I/O error on device sda3, logical block 4864264 lost page write due to I/O error on sda3 Buffer I/O error on device sda3, logical block 4864265 lost page write due to I/O error on sda3 end_request: I/O error, dev sda, sector 59504435 end_request: I/O error, dev sda, sector 60066619 end_request: I/O error, dev sda, sector 84012738 end_request: I/O error, dev sda, sector 84012978 end_request: I/O error, dev sda, sector 84013098 end_request: I/O error, dev sda, sector 84013178 end_request: I/O error, dev sda, sector 84799786 end_request: I/O error, dev sda, sector 84799810 end_request: I/O error, dev sda, sector 84799858 end_request: I/O error, dev sda, sector 84799890 end_request: I/O error, dev sda, sector 84799978 end_request: I/O error, dev sda, sector 84800026 end_request: I/O error, dev sda, sector 85325298 end_request: I/O error, dev sda, sector 85325314 end_request: I/O error, dev sda, sector 85325402 end_request: I/O error, dev sda, sector 94236554 end_request: I/O error, dev sda, sector 95022770 end_request: I/O error, dev sda, sector 95022962 end_request: I/O error, dev sda, sector 95023026 end_request: I/O error, dev sda, sector 95023058 end_request: I/O error, dev sda, sector 95023122 end_request: I/O error, dev sda, sector 95023218 end_request: I/O error, dev sda, sector 104984242 end_request: I/O error, dev sda, sector 122285690 end_request: I/O error, dev sda, sector 122286754 end_request: I/O error, dev sda, sector 133367024 end_request: I/O error, dev sda, sector 63 end_request: I/O error, dev sda, sector 1315863 end_request: I/O error, dev sda, sector 12058703 end_request: I/O error, dev sda, sector 14417983 end_request: I/O error, dev sda, sector 14565439 end_request: I/O error, dev sda, sector 14680143 end_request: I/O error, dev sda, sector 15728727 end_request: I/O error, dev sda, sector 15728823 end_request: I/O error, dev sda, sector 15728903 end_request: I/O error, dev sda, sector 15728975 end_request: I/O error, dev sda, sector 15728983 end_request: I/O error, dev sda, sector 15729095 end_request: I/O error, dev sda, sector 15729159 end_request: I/O error, dev sda, sector 20482875 end_request: I/O error, dev sda, sector 20482891 end_request: I/O error, dev sda, sector 133367024 EXT3-fs error (device sda6): ext3_get_inode_loc: unable to read inode block - inode=1296651, block=1310722 end_request: I/O error, dev sda, sector 122910248 end_request: I/O error, dev sda, sector 122881248 EXT3-fs error (device sda6) in ext3_reserve_inode_write: IO failure end_request: I/O error, dev sda, sector 122881248 end_request: I/O error, dev sda, sector 15728975 EXT3-fs error (device sda1): ext3_get_inode_loc: unable to read inode block - inode=1946002, block=1966114 end_request: I/O error, dev sda, sector 63 EXT3-fs error (device sda1) in ext3_reserve_inode_write: IO failure end_request: I/O error, dev sda, sector 63 end_request: I/O error, dev sda, sector 25319 end_request: I/O error, dev sda, sector 25367 Aborting journal on device sda1. end_request: I/O error, dev sda, sector 122910272 Aborting journal on device sda6. sda : READ CAPACITY failed. sda : status=0, message=00, host=4, driver=00 sda : sense not available. sda: Write Protect is off sda: asking for cache data failed sda: assuming drive cache: write through ext3_abort called. EXT3-fs error (device sda6): ext3_journal_start_sb: Detected aborted journal Remounting filesystem read-only printk: 72 messages suppressed. Buffer I/O error on device sda6, logical block 1048578 lost page write due to I/O error on sda6 Buffer I/O error on device sda6, logical block 1310722 lost page write due to I/O error on sda6 Buffer I/O error on device sda1, logical block 164469 lost page write due to I/O error on sda1 Buffer I/O error on device sda1, logical block 164470 lost page write due to I/O error on sda1 Buffer I/O error on device sda1, logical block 164473 lost page write due to I/O error on sda1 Buffer I/O error on device sda1, logical block 164477 lost page write due to I/O error on sda1 ext3_abort called. EXT3-fs error (device sda1): ext3_journal_start_sb: Detected aborted journal Remounting filesystem read-only EXT3-fs error (device sda1): ext3_find_entry: reading directory #1782881 offset 0 ext3_abort called. EXT3-fs error (device sda3): ext3_journal_start_sb: Detected aborted journal Remounting filesystem read-only printk: 22 messages suppressed. Buffer I/O error on device sda5, logical block 3071 lost page write due to I/O error on sda5 Aborting journal on device sda5. ext3_abort called. EXT3-fs error (device sda5): ext3_journal_start_sb: Detected aborted journal Remounting filesystem read-only Buffer I/O error on device sda5, logical block 1343561 lost page write due to I/O error on sda5 Buffer I/O error on device sda5, logical block 1638410 lost page write due to I/O error on sda5 Buffer I/O error on device sda5, logical block 4784130 lost page write due to I/O error on sda5 Buffer I/O error on device sda5, logical block 4915280 lost page write due to I/O error on sda5 this is with the ohci1394 interrupt chrt'd up to 90 and the task's priority set to 90 as well.
Created attachment 10585 [details] Kernel config for 2.6.20-rt8
Created attachment 10586 [details] dmesg for 2.6.20-rt8
I didn't realize how the 1394 driver was setup to handle isochronous transmit, and used code based around two threads that I originally used for asynchronous receive. I figured that the read() w/in raw1394_loop_iterate() would unblock once per cycle start or something (8k/second), but no, it was instead a straight poll, which led to me making calls into raw1394_ioctl at the rate of about 450,000 calls per second. After due consideration, I have this down to 1 per packet (1200/second), and things are working swimmingly. Mea culpa. Sorry for the noise.