Bug 7979 - 2.6.20-rt2: high prio task yields journal commit I/O error
Summary: 2.6.20-rt2: high prio task yields journal commit I/O error
Status: REJECTED INVALID
Alias: None
Product: Alternate Trees
Classification: Unclassified
Component: rt (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: alt-trees_rt
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-02-09 15:35 UTC by Robert Crocombe
Modified: 2007-03-13 19:53 UTC (History)
0 users

See Also:
Kernel Version: 2.6.20-rt2
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
dmesg (22.12 KB, text/plain)
2007-02-09 15:37 UTC, Robert Crocombe
Details
Kernel config (24.08 KB, text/plain)
2007-02-09 15:37 UTC, Robert Crocombe
Details
Kernel config for 2.6.20-rt8 (27.26 KB, text/plain)
2007-03-02 12:44 UTC, Robert Crocombe
Details
dmesg for 2.6.20-rt8 (23.28 KB, text/plain)
2007-03-02 12:44 UTC, Robert Crocombe
Details

Description Robert Crocombe 2007-02-09 15:35:52 UTC
Most recent kernel where this bug did *NOT* occur: 2.6.16-rt29
Distribution: Fedora Core 5
Hardware Environment: 4 processor IWill H8502: 2.8GHz Opterons, 10GB of RAM
Software Environment:
Problem Description:

I'm trying to transmit 1394b isochronous data @ 1200Hz without packet buffering:
the data contents are generated in each 833us interval and the interrupts that
the packets generate on the receiver of the isochronous data drive the timing
for an entire subsystem (so regularity is a must).

I figure out to which IRQ the particuar 1394b port I'm using is going (by
running a short version of the test and seeing where the interrupts are
generated in /proc/interrupts), and then I do a chrt -p 99 `pidof 'IRQ-XYZ'`. 
The process itself does an mlockall() and uses sched_setscheduler to set its own
priority to 99.  I also tried a little lower, with a prio of 96 for both process
and IRQ, in case I was just really really starving the kernel, but it seems to
make no difference.

w/ 2.6.16-rt29 I was getting mysterious gaps of almost exactly 100ms
periodically.  I was hoping a more recent kernel might improve things, and in
fact it largely does work better, but it also explodes (usually).  My goal is to
be able to go 300 seconds and not lose a packet.  Sometimes I can, and sometimes
I get this on the console:

Message from syslogd@bubba at Fri Feb  9 16:15:13 2007 ...
bubba kernel: journal commit I/O error

Before this happens, I see some part of the following on the serial console:

scsi1: Device reset code sleeping
e reset code sleepingimer expired ( 1)
scsi1: Device reset timer expired (active 1)
scsi1: Device reset returning 0x2003
Recovery SCB completes
Recovery SCB completes
scsi1: At time of recovery, card was paused

and then presumably things progress and I get the rest of that plus this:

>>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<<
scsi1: Dumping Card State at program address 0x7 Mode 0x33
Card was paused
INTSTAT[0x0] SELOID[0x0] SELID[0x10] HS_MAILBOX[0x0]
INTCTL[0x80] SEQINTSTAT[0x0] SAVED_MODE[0x11] DFFSTAT[0x33]
SCSISIGI[0x0] SCSIPHASE[0x0] SCSIBUS[0x0] LASTPHASE[0x1]
SCSISEQ0[0x0] SCSISEQ1[0x12] SEQCTL0[0x0] SEQINTCTL[0x0]
SEQ_FLAGS[0x0] SEQ_FLAGS2[0x4] QFREEZE_COUNT[0x2]
KERNEL_QFREEZE_COUNT[0x2] MK_MESSAGE_SCB[0xff00]
MK_MESSAGE_SCSIID[0xff] SSTAT0[0x0] SSTAT1[0x0]
SSTAT2[0x0] SSTAT3[0x0] PERRDIAG[0x0] SIMODE1[0xa4]
LQISTAT0[0x0] LQISTAT1[0x0] LQISTAT2[0x0] LQOSTAT0[0x0]
LQOSTAT1[0x0] LQOSTAT2[0xe1]

SCB Count = 64 CMDS_PENDING = 0 LASTSCB 0x3c CURRSCB 0x2 NEXTSCB 0xffc0
qinstart = 20024 qinfifonext = 20025
QINFIFO: 0x2
WAITING_TID_QUEUES:
Pending list:
  2 FIFO_USE[0x0] SCB_CONTROL[0x0] SCB_SCSIID[0x7]
Total 1
Kernel Free SCB list: 43 60 23 30 38 5 15 59 17 44 35 3 45 62 8 48 26 41 0 52 12
46 19 21 31 32 36 25 54 34 37 53 58 27 4 40 20 33 9 6 29 42 50 1 11 22 24 16 18
47 61 56 63 28 10 14 51 39 55 57 49 7 13
Sequencer Complete DMA-inprog list:
Sequencer Complete list:
Sequencer DMA-Up and Complete list:
Sequencer On QFreeze and Complete list:


scsi1: FIFO0 Free, LONGJMP == 0x8072, SCB 0x2
SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x4] DFSTATUS[0x89]
SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0]
SOFFCNT[0x0] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0
HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10]

scsi1: FIFO1 Free, LONGJMP == 0x8063, SCB 0x3
SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89]
SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0]
SOFFCNT[0x0] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0
HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10]
LQIN: 0x8 0x0 0x0 0x2 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
0x0 0x0
scsi1: LQISTATE = 0x0, LQOSTATE = 0x0, OPTIONMODE = 0x52
scsi1: OS_SPACE_CNT = 0x20 MAXCMDCNT = 0x1
scsi1: SAVED_SCSIID = 0x0 SAVED_LUN = 0x0
SIMODE0[0xc]
CCSCBCTL[0x4]
scsi1: REG0 == 0x1d, SINDEX = 0x100, DINDEX = 0x102
scsi1: SCBPTR == 0x2, SCB_NEXT == 0xffc0, SCB_NEXT2 == 0xffc4
CDB 0 0 0 0 0 0
STACK: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
<<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>>
scsi1:0:0:0: Cmd aborted from QINFIFO
scsi1: At time of recovery, card was paused
>>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<<
scsi1: Dumping Card State at program address 0x12b Mode 0x33
Card was paused
INTSTAT[0x0] SELOID[0x1] SELID[0x10] HS_MAILBOX[0x0]
INTCTL[0x80] SEQINTSTAT[0x0] SAVED_MODE[0x11] DFFSTAT[0x33]
SCSISIGI[0xb6] SCSIPHASE[0x4] SCSIBUS[0xc0] LASTPHASE[0xa0]
SCSISEQ0[0x0] SCSISEQ1[0x12] SEQCTL0[0x0] SEQINTCTL[0x0]
SEQ_FLAGS[0x40] SEQ_FLAGS2[0x4] QFREEZE_COUNT[0x2]
KERNEL_QFREEZE_COUNT[0x2] MK_MESSAGE_SCB[0xff00]
MK_MESSAGE_SCSIID[0xff] SSTAT0[0x2] SSTAT1[0x1]
SSTAT2[0x0] SSTAT3[0x0] PERRDIAG[0x0] SIMODE1[0xac]
LQISTAT0[0x0] LQISTAT1[0x0] LQISTAT2[0x0] LQOSTAT0[0x0]
LQOSTAT1[0x0] LQOSTAT2[0x0]

SCB Count = 64 CMDS_PENDING = 1 LASTSCB 0x3c CURRSCB 0x2 NEXTSCB 0xffc0
qinstart = 20025 qinfifonext = 20025
QINFIFO:
WAITING_TID_QUEUES:
Pending list:
  2 FIFO_USE[0x0] SCB_CONTROL[0x62] SCB_SCSIID[0x17]
Total 1
Kernel Free SCB list: 43 60 23 30 38 5 15 59 17 44 35 3 45 62 8 48 26 41 0 52 12
46 19 21 31 32 36 25 54 34 37 53 58 27 4 40 20 33 9 6 29 42 50 1 11 22 24 16 18
47 61 56 63 28 10 14 51 39 55 57 49 7 13
Sequencer Complete DMA-inprog list:
Sequencer Complete list:
Sequencer DMA-Up and Complete list:
Sequencer On QFreeze and Complete list:


scsi1: FIFO0 Free, LONGJMP == 0x8072, SCB 0x2
SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x4] DFSTATUS[0x89]
SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0]
SOFFCNT[0x0] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0
HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10]

scsi1: FIFO1 Free, LONGJMP == 0x8063, SCB 0x3
SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89]
SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0]
SOFFCNT[0x0] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0
HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10]
LQIN: 0x8 0x0 0x0 0x2 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
0x0 0x0
scsi1: LQISTATE = 0x0, LQOSTATE = 0x0, OPTIONMODE = 0x52
scsi1: OS_SPACE_CNT = 0x20 MAXCMDCNT = 0x1
scsi1: SAVED_SCSIID = 0x0 SAVED_LUN = 0x0
SIMODE0[0xc]
CCSCBCTL[0x4]
scsi1: REG0 == 0x1d, SINDEX = 0x107, DINDEX = 0x104
scsi1: SCBPTR == 0x2, SCB_NEXT == 0xff00, SCB_NEXT2 == 0xffc4
CDB 0 0 0 0 0 0
STACK: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
<<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>>
scsi1: Recovery code sleeping
scsi1: Timer Expired (active 1)
Recovery code awake
scsi1: Command abort returning 0x2003
sd 1:0:0:0: rejecting I/O to offline device
md: super_written gets error=-5, uptodate=0
raid1: Disk failure on sda1, disabling device.
        Operation continuing on 1 devices
end_request: I/O error, dev sdb, sector 575673023
md: super_written gets error=-5, uptodate=0
sd 1:0:1:0: rejecting I/O to offline device
sd 1:0:1:0: rejecting I/O to offline device
md: super_written gets error=-5, uptodate=0
RAID1 conf printout:
 --- wd:1 rd:2
 disk 0, wo:1, o:0, dev:sda1
 disk 1, wo:0, o:1, dev:sdb1
RAID1 conf printout:
stopped custom tracer.
Unable to handle kernel NULL pointer dereference at 0000000000000018 RIP:
 [<ffffffff8026a8d6>] gart_unmap_sg+0x18/0x4d
PGD 18019d067 PUD 27921e067 PMD 0
Oops: 0000 [1] PREEMPT SMP
CPU 3
Modules linked in: raw1394 ohci1394 ieee1394 tg3
Pid: 497, comm: IRQ-89 Not tainted 2.6.20-rt2_02 #2
RIP: 0010:[<ffffffff8026a8d6>]  [<ffffffff8026a8d6>] gart_unmap_sg+0x18/0x4d
RSP: 0000:ffff8100cbf3fca0  EFLAGS: 00010297
RAX: ffffffff804e1780 RBX: 0000000000000000 RCX: 0000000000000001
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8101803d0870
RBP: 0000000000000000 R08: 0000000000000001 R09: 000000000000ff00
R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000000001
R13: 0000000000000001 R14: ffff8101803d0870 R15: 0000000000000001
FS:  00002acba9a57140(0000) GS:ffff8101801007c0(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000018 CR3: 0000000278d9e000 CR4: 00000000000006e0
Process IRQ-89 (pid: 497, threadinfo ffff8100cbf3e000, task ffff8100cbc4a0c0)
Stack:  ffff810081d14000 ffff8100cbf08b00 ffff8100cbe955f8 0000000000000000
 0000000000000002 ffffffff80376f41 ffff810081d14000 ffff810277619180
 0000000481d14000 0000000000000002 0000000000000003 000000038036a150
Call Trace:
 [<ffffffff80376f41>] ahd_done+0xe9/0x58d
 [<ffffffff80367e6e>] ahd_abort_scbs+0x281/0x32a
 [<ffffffff8036f2f5>] ahd_handle_scsiint+0x1470/0x17b5
 [<ffffffff80377b1e>] ahd_linux_isr+0x16d/0x18a
 [<ffffffff8020f65a>] handle_IRQ_event+0x50/0xdf
 [<ffffffff8029990d>] thread_simple_irq+0x44/0x7a
 [<ffffffff802889cb>] keventd_create_kthread+0x0/0x6d
 [<ffffffff8029a0fc>] do_irqd+0x10a/0x32c
 [<ffffffff80299ff2>] do_irqd+0x0/0x32c
 [<ffffffff802889cb>] keventd_create_kthread+0x0/0x6d
 [<ffffffff8022f6a4>] kthread+0xd1/0x100
 [<ffffffff80257b08>] child_rip+0xa/0x12
 [<ffffffff802889cb>] keventd_create_kthread+0x0/0x6d
 [<ffffffff8022f5d3>] kthread+0x0/0x100
 [<ffffffff80257afe>] child_rip+0x0/0x12


Code: 8b 43 18 48 89 da 85 c0 74 22 48 83 c3 20 83 7a 0c 00 74 18
RIP  [<ffffffff8026a8d6>] gart_unmap_sg+0x18/0x4d
 RSP <ffff8100cbf3fca0>
CR2: 0000000000000018
  --- wd:1 rd:2
 disk 1, wo:0, o:1, dev:sdb1
sd 1:0:1:0: rejecting I/O to offline device
sd 1:0:1:0: rejecting I/O to offline device
Buffer I/O error on device md0, logical block 7672399
lost page write due to I/O error on md0
sd 1:0:1:0: rejecting I/O to offline device
Buffer I/O error on device md0, logical block 29287747
lost page write due to I/O error on md0
sd 1:0:1:0: rejecting I/O to offline device
Buffer I/O error on device md0, logical block 8581
lost page write due to I/O error on md0
Aborting journal on device md0.
ext3_abort called.
journal commit I/O error
ext3_abort called.
EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal
Remounting filesystem read-only
EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal
sd 1:0:1:0: rejecting I/O to offline device
sd 1:0:1:0: rejecting I/O to offline device
sd 1:0:1:0: rejecting I/O to offline device
sd 1:0:1:0: rejecting I/O to offline device
sd 1:0:1:0: rejecting I/O to offline device
sd 1:0:1:0: rejecting I/O to offline device
sd 1:0:1:0: rejecting I/O to offline device
sd 1:0:1:0: rejecting I/O to offline device
sd 1:0:1:0: rejecting I/O to offline device
md: super_written gets error=-5, uptodate=0
sd 1:0:1:0: rejecting I/O to offline device
sd 1:0:1:0: rejecting I/O to offline device
md: super_written gets error=-5, uptodate=0
sd 1:0:1:0: rejecting I/O to offline device
Buffer I/O error on device md0, logical block 3538082
lost page write due to I/O error on md0
sd 1:0:1:0: rejecting I/O to offline device
Buffer I/O error on device md0, logical block 3538084
lost page write due to I/O error on md0
sd 1:0:1:0: rejecting I/O to offline device
Buffer I/O error on device md0, logical block 7502089
lost page write due to I/O error on md0
sd 1:0:1:0: rejecting I/O to offline device
Buffer I/O error on device md0, logical block 7633082
lost page write due to I/O error on md0
sd 1:0:1:0: rejecting I/O to offline device
Buffer I/O error on device md0, logical block 7633083
lost page write due to I/O error on md0
sd 1:0:1:0: rejecting I/O to offline device
Buffer I/O error on device md0, logical block 7633086
lost page write due to I/O error on md0
sd 1:0:1:0: rejecting I/O to offline device
Buffer I/O error on device md0, logical block 7633089
lost page write due to I/O error on md0
sd 1:0:1:0: rejecting I/O to offline device
Buffer I/O error on device md0, logical block 21850922
lost page write due to I/O error on md0
sd 1:0:1:0: rejecting I/O to offline device
Buffer I/O error on device md0, logical block 21850923
lost page write due to I/O error on md0
sd 1:0:1:0: rejecting I/O to offline device
Buffer I/O error on device md0, logical block 21850964
lost page write due to I/O error on md0
sd 1:0:1:0: rejecting I/O to offline device
sd 1:0:1:0: rejecting I/O to offline device
sd 1:0:1:0: rejecting I/O to offline device
sd 1:0:1:0: rejecting I/O to offline device
sd 1:0:1:0: rejecting I/O to offline device
sd 1:0:1:0: rejecting I/O to offline device
sd 1:0:1:0: rejecting I/O to offline device
sd 1:0:1:0: rejecting I/O to offline device
sd 1:0:1:0: rejecting I/O to offline device
sd 1:0:1:0: rejecting I/O to offline device
sd 1:0:1:0: rejecting I/O to offline device
sd 1:0:1:0: rejecting I/O to offline device
md: super_written gets error=-5, uptodate=0

at which point I can't write anything to disk, etc. so it's Sysrq-S, Sysrq-U,
Sysrq-T, Sysrq-L, Sysrq-B time.


Steps to reproduce:

Oh boy.  Simplest would be two threads.  First set priority, locking etc, and
then isochronous transmit via libraw1394.  Fork a thread which calls
raw1394_loop_iterate().  The other thread right now simply does a busy-wait for
1/1200th of a second and then does a write that updates a little state.  The 1st
thread's continuous calling of raw1394_loop_iterate() results in the kernel
calling a registered iso xmit callback function, which notices that a write has
occured and assigns things as necessary within the callback to transmit (in my
case) 48 bytes and return RAW1394_ISO_OK.  If the callback finds that no write
has occurred since the last transmit it returns with RAW1394_ISO_AGAIN (and
nothing is output).  Usually the thread that is busy-waiting will be running a
simulation.

If necessary, I can probably cook something up.
Comment 1 Robert Crocombe 2007-02-09 15:37:19 UTC
Created attachment 10373 [details]
dmesg
Comment 2 Robert Crocombe 2007-02-09 15:37:53 UTC
Created attachment 10374 [details]
Kernel config
Comment 3 Anonymous Emailer 2007-02-09 16:08:31 UTC
Reply-To: akpm@linux-foundation.org

On Fri, 9 Feb 2007 15:45:31 -0800
bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=7979
> 
>            Summary: 2.6.20-rt2: high prio task yields journal commit I/O
>                     error

Well you've been having fun there.

I'd say that you've hit two (maybe 1.5) bugs in the scsi code:

- I assume that your high-priority task has starved a scsi kernel thread
  for so long that when that thread finally got control, it decided that
  something had timed out and declared an error.

  Maybe.  Or perhaps the card decided that it hadn't been serviced for so
  long that it declared an error.  It would need someone who is familiar
  with scsi and aic7xxx to determine that.

- In response to the timeout, aic7xxx error handling went and passed crap
  into the scatter/gather unmapping code and the kernel oopsed.


Frankly, I doubt if either of these things (or at least, the first one) are
likely to be fixed in a hurry and I'd suggest that you look at continuing
your work on (say) a SATA or IDE machine, sorry.

Comment 4 Robert Crocombe 2007-03-02 12:43:29 UTC
Dug up a PATA disk for the same machine and installed Fedora Core 5 (as before).
  Resumed testing with a new kernel based on 2.6.20-rt8 (<- slightly newer than
previously).  The config is also a bit different since I have added oprofile
support and turned on a few of Ingo's -rt debugging features to kind of poke at
them and see if I could use them to figure out where the larger delays are
coming from (uhm, not as yet).

It's perhaps a bit better, but the problems are still there.  From the serial
console:

ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata1.00: cmd ca/00:08:d3:db:38/00:00:00:00:00/e1 tag 0 cdb 0x0 data 4096 out
         res 40/00:00:00:00:00/00:00:00:00:00/e0 Emask 0x4 (timeout)
ata1.00: qc timeout (cmd 0xef)
ata1.00: failed to set xfermode (err_mask=0x4)
ata1.00: limiting speed to UDMA/44
ata1: failed to recover some devices, retrying in 5 secs
ata1.00: qc timeout (cmd 0xef)
ata1.00: failed to set xfermode (err_mask=0x4)
ata1.00: limiting speed to PIO0
ata1: failed to recover some devices, retrying in 5 secs
ata1.00: qc timeout (cmd 0xef)
ata1.00: failed to set xfermode (err_mask=0x4)
ata1.00: disabled
end_request: I/O error, dev sda, sector 20503507
Buffer I/O error on device sda3, logical block 2579
lost page write due to I/O error on sda3
Aborting journal on device sda3.
end_request: I/O error, dev sda, sector 59281459
Buffer I/O error on device sda3, logical block 4849823
lost page write due to I/O error on sda3
end_request: I/O error, dev sda, sector 59281555
Buffer I/O error on device sda3, logical block 4849835
lost page write due to I/O error on sda3
end_request: I/O error, dev sda, sector 59281435
Buffer I/O error on device sda3, logical block 4849820
lost page write due to I/O error on sda3
end_request: I/O error, dev sda, sector 59280187
Buffer I/O error on device sda3, logical block 4849664
lost page write due to I/O error on sda3
Buffer I/O error on device sda3, logical block 4849665
lost page write due to I/O error on sda3
end_request: I/O error, dev sda, sector 59364323
Buffer I/O error on device sda3, logical block 4860181
lost page write due to I/O error on sda3
end_request: I/O error, dev sda, sector 59396603
Buffer I/O error on device sda3, logical block 4864216
lost page write due to I/O error on sda3
end_request: I/O error, dev sda, sector 59396987
Buffer I/O error on device sda3, logical block 4864264
lost page write due to I/O error on sda3
Buffer I/O error on device sda3, logical block 4864265
lost page write due to I/O error on sda3
end_request: I/O error, dev sda, sector 59504435
end_request: I/O error, dev sda, sector 60066619
end_request: I/O error, dev sda, sector 84012738
end_request: I/O error, dev sda, sector 84012978
end_request: I/O error, dev sda, sector 84013098
end_request: I/O error, dev sda, sector 84013178
end_request: I/O error, dev sda, sector 84799786
end_request: I/O error, dev sda, sector 84799810
end_request: I/O error, dev sda, sector 84799858
end_request: I/O error, dev sda, sector 84799890
end_request: I/O error, dev sda, sector 84799978
end_request: I/O error, dev sda, sector 84800026
end_request: I/O error, dev sda, sector 85325298
end_request: I/O error, dev sda, sector 85325314
end_request: I/O error, dev sda, sector 85325402
end_request: I/O error, dev sda, sector 94236554
end_request: I/O error, dev sda, sector 95022770
end_request: I/O error, dev sda, sector 95022962
end_request: I/O error, dev sda, sector 95023026
end_request: I/O error, dev sda, sector 95023058
end_request: I/O error, dev sda, sector 95023122
end_request: I/O error, dev sda, sector 95023218
end_request: I/O error, dev sda, sector 104984242
end_request: I/O error, dev sda, sector 122285690
end_request: I/O error, dev sda, sector 122286754
end_request: I/O error, dev sda, sector 133367024
end_request: I/O error, dev sda, sector 63
end_request: I/O error, dev sda, sector 1315863
end_request: I/O error, dev sda, sector 12058703
end_request: I/O error, dev sda, sector 14417983
end_request: I/O error, dev sda, sector 14565439
end_request: I/O error, dev sda, sector 14680143
end_request: I/O error, dev sda, sector 15728727
end_request: I/O error, dev sda, sector 15728823
end_request: I/O error, dev sda, sector 15728903
end_request: I/O error, dev sda, sector 15728975
end_request: I/O error, dev sda, sector 15728983
end_request: I/O error, dev sda, sector 15729095
end_request: I/O error, dev sda, sector 15729159
end_request: I/O error, dev sda, sector 20482875
end_request: I/O error, dev sda, sector 20482891
end_request: I/O error, dev sda, sector 133367024
EXT3-fs error (device sda6): ext3_get_inode_loc: unable to read inode block -
inode=1296651, block=1310722
end_request: I/O error, dev sda, sector 122910248
end_request: I/O error, dev sda, sector 122881248
EXT3-fs error (device sda6) in ext3_reserve_inode_write: IO failure
end_request: I/O error, dev sda, sector 122881248
end_request: I/O error, dev sda, sector 15728975
EXT3-fs error (device sda1): ext3_get_inode_loc: unable to read inode block -
inode=1946002, block=1966114
end_request: I/O error, dev sda, sector 63
EXT3-fs error (device sda1) in ext3_reserve_inode_write: IO failure
end_request: I/O error, dev sda, sector 63
end_request: I/O error, dev sda, sector 25319
end_request: I/O error, dev sda, sector 25367
Aborting journal on device sda1.
end_request: I/O error, dev sda, sector 122910272
Aborting journal on device sda6.
sda : READ CAPACITY failed.
sda : status=0, message=00, host=4, driver=00
sda : sense not available.
sda: Write Protect is off
sda: asking for cache data failed
sda: assuming drive cache: write through
ext3_abort called.
EXT3-fs error (device sda6): ext3_journal_start_sb: Detected aborted journal
Remounting filesystem read-only
printk: 72 messages suppressed.
Buffer I/O error on device sda6, logical block 1048578
lost page write due to I/O error on sda6
Buffer I/O error on device sda6, logical block 1310722
lost page write due to I/O error on sda6
Buffer I/O error on device sda1, logical block 164469
lost page write due to I/O error on sda1
Buffer I/O error on device sda1, logical block 164470
lost page write due to I/O error on sda1
Buffer I/O error on device sda1, logical block 164473
lost page write due to I/O error on sda1
Buffer I/O error on device sda1, logical block 164477
lost page write due to I/O error on sda1
ext3_abort called.
EXT3-fs error (device sda1): ext3_journal_start_sb: Detected aborted journal
Remounting filesystem read-only
EXT3-fs error (device sda1): ext3_find_entry: reading directory #1782881 offset 0
ext3_abort called.
EXT3-fs error (device sda3): ext3_journal_start_sb: Detected aborted journal
Remounting filesystem read-only
printk: 22 messages suppressed.
Buffer I/O error on device sda5, logical block 3071
lost page write due to I/O error on sda5
Aborting journal on device sda5.
ext3_abort called.
EXT3-fs error (device sda5): ext3_journal_start_sb: Detected aborted journal
Remounting filesystem read-only
Buffer I/O error on device sda5, logical block 1343561
lost page write due to I/O error on sda5
Buffer I/O error on device sda5, logical block 1638410
lost page write due to I/O error on sda5
Buffer I/O error on device sda5, logical block 4784130
lost page write due to I/O error on sda5
Buffer I/O error on device sda5, logical block 4915280
lost page write due to I/O error on sda5 

this is with the ohci1394 interrupt chrt'd up to 90 and the task's priority set
to 90 as well.
Comment 5 Robert Crocombe 2007-03-02 12:44:19 UTC
Created attachment 10585 [details]
Kernel config for 2.6.20-rt8
Comment 6 Robert Crocombe 2007-03-02 12:44:39 UTC
Created attachment 10586 [details]
dmesg for 2.6.20-rt8
Comment 7 Robert Crocombe 2007-03-13 19:53:22 UTC
I didn't realize how the 1394 driver was setup to handle isochronous transmit,
and used code based around two threads that I originally used for asynchronous
receive.  I figured that the read() w/in raw1394_loop_iterate() would unblock
once per cycle start or something (8k/second), but no, it was instead a straight
poll, which led to me making calls into raw1394_ioctl at the rate of about
450,000 calls per second.  After due consideration, I have this down to 1 per
packet (1200/second), and things are working swimmingly.  Mea culpa.

Sorry for the noise.

Note You need to log in before you can comment on or make changes to this bug.