Most recent kernel where this bug did not occur: unknown Distribution: Ubuntu Hardware Environment: ppc32 Apple Mac mini Problem Description: After over 10 days of uptime, mounting and unmounting my external firewire hardddrive for backups, I got the following OOPs today when trying to mount the drive. I've seen problems where the cable gets bumped loose and I'll see something similar, however I have not been able to verify if the cable was secure when this occured. The mount command is still hung, but the box seems to be running fine. ieee1394: sbp2: aborting sbp2 command sd 0:0:0:0: command: cdb[0]=0x28: 28 00 09 27 c0 03 00 00 02 00 ieee1394: sbp2: aborting sbp2 command sd 0:0:0:0: command: cdb[0]=0x0: 00 00 00 00 00 00 Oops: kernel access of bad area, sig: 11 [#1] PREEMPT NIP: C0023FF8 LR: C0023FF8 SP: EFC0FEB0 REGS: efc0fe00 TRAP: 0300 Not tainted MSR: 00001032 EE: 0 PR: 0 FP: 0 ME: 1 IR/DR: 11 DAR: 00000000, DSISR: 40000000 TASK = c134b230[317] 'scsi_eh_0' THREAD: efc0e000 Last syscall: -1 GPR00: 00000000 EFC0FEB0 C134B230 00000001 C05BDE28 FFFFFFFF C0650000 C0664E84 GPR08: 00040000 00000001 C13DF400 EFC0E000 C0650000 00000000 00000000 00000000 GPR16: 00000000 C02EF5A0 C06070EC C0586F7C C06070EC C0586F7C C06070EC C0650000 GPR24: 00000003 C13EE204 C13EE268 00000001 00009032 00000000 C13EE1C0 C02EF440 NIP [c0023ff8] complete+0x28/0x90 LR [c0023ff8] complete+0x28/0x90 Call trace: [c02ef45c] scsi_eh_done+0x1c/0x30 [c03248b8] sbp2scsi_abort+0x158/0x170 [c02efbcc] scsi_send_eh_cmnd+0x10c/0x1a0 [c02efce8] scsi_eh_tur+0x88/0xe0 [c02f0930] scsi_error_handler+0x450/0xa10 [c00437e8] kthread+0x108/0x110 [c0007534] kernel_thread+0x44/0x60 note: scsi_eh_0[317] exited with preempt_count 1 Steps to reproduce: Not easily reproduced.
My working hypothesis are potential race conditions in sbp2 because sbp2 handles lots of the protocol without waiting for 1394 transactions to be complete. (It does so because most of it runs in hardware interrupt or soft IRQ context.) I am planning to rework this but don't expect fast-paced progress...
Early in April, there went a patch into 2.6.17-rcX and 2.6.16.2 which cleaned up the locking regime in sbp2scsi_abort. This _may_ have fixed it. http://marc.theaimsgroup.com/?l=linux-kernel&m=114391907428954 Should you see the problem ever again, please reopen this bug or notify me if you have insufficient permissions to reopen it.